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The multigrid methodology is reviewed. By integrating numerical processes at all scales of a problem, it seeks to 
perform various computational tasks at a cost that rises as slowly as possible as a function of n, the number of 
degrees of freedom in the problem. Current and potential benefits for lattice field computations are outlined. They 
include: 0(n) solution of Dirac equations; just O(l) operations in updating the solution (upon any local change of 
data, including the gauge field); similar efficiency in gauge fixing and updating; O(l) operations in updating the 
inverse matrix and in calculating the change in the logarithm of its determinant; O(n) operations per producing each 
independent configuration in statistical simulations (eliminating CSD), and, more important, effectively just O(l) 
operations per each independent measurement (eliminating the volume factor as well). These potential capabilities 
have been demonstrated on simple model problems. Extensions to real life are explored. 



1. Elementary Acquaintance with 
Multigrid 

1.1. Particle minimization problem 

To introduce some of the basic concepts of 
multi-scale computations, consider the simple 
example where one wishes to calculate the ef- 
fect of an external field on the stationary state 
of a piece of solid made of n classical atoms. De- 
note by n = (^1,^2,^3) the coordinates of the 
i-th atom, by r = (r\,T2,-- ■ ,r n ) the vector of 
all atom positions (the configuration) and by 

^(r)=5>«(l r <- r *D (1-1) 
the energy of their mutual interactions. Let r° = 
,r°) be the given steady state in the 



taincd when external forces / = (/1, f%, . . . , f n ) 
are added, i.e., 



E(r*) = min E(r) 

r 

where, for example, 
E(r)=E°(r)-'£f i : 



(1.2) 



(1.3) 



The computational problem of fast evaluation, 
for any given r, of E(r), or of the residual forces 

dE 



VE{r) 



dr 



-(r); i = 1,2,. ..,n; \x = 1,2,3 



( r r o 

absence of an external field, i.e., 
E°{r°) = mmE°{r), 

r 

entailing dE°(r°)/dr ifl = (i = 1, 2, . . . , n; fi = 
1, 2, 3). One wishes to calculate the state r* ob- 



in case far interactions are significant (e.g., elec- 
trostatic forces) is mentioned in Sec. 6. Here we 
confine our attention to the problem of finding 
r*. 

1.2. Relaxation 

A general approach for calculating r* is the 
particle-by-particle minimization or relaxation. 
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At each step of this process, the position of only 
one particle, r 4 say, is changed, keeping the posi- 
tion of all others fixed. The new value of rj is cho- 
sen so as to reduce E(r) as much as possible. Re- 
peating this step for all particles (i = 1, 2, . . . , n) 
is called a relaxation sweep. By performing a suf- 
ficiently long sequence of relaxation sweeps, one 
hopes to be able to get as close to r* as one 
wishes. 

One main difficulty with the relaxation pro- 
cess is its extremely slow convergence. The rea- 
son is that, when all other particles arc held 
fixed, Ti can change only slightly, only a frac- 
tion of the typical distance between neighboring 
atoms, before the energy E(r) starts to rise (very 
sharply). Hence, very many relaxation sweeps 
will be needed to obtain a new configuration r 
(a new shape of the solid) macroscopically differ- 
ent from r°. (Another possible difficulty — the 
danger of converging to the wrong solution — is 
related to global optimization; cf. Sec. 6). 

To be sure, if the external forces on neigh- 
boring particles are very different from each 
other, the first few relaxation sweeps may ex- 
hibit fast local adjustments of the configura- 
tion, rapidly yielding configurations with possi- 
bly much smaller residual forces VE(r). But the 
advance thereafter toward large-scale changes 
will be painfully slow, eventually exhibiting also 
very slow further reduction of the residual forces. 
The slowness clearly increases with the size n: 
the more atoms in the system, the more relax- 
ation sweeps that are needed to achieve reason- 
able convergence. 

1.3. Multiscale relaxation 

Moving only one particle at a time being so 
inefficient, ways are evidently needed to perform 
collective motions of atoms. 

A collective displacement on scale h, say, with 



center Xk = {xki 1 Xk2 1 Xkz), and amplitude itk — 
(ttfci, Ufc2, U/t3) and shape function Wk(C) can be 
defined by the replacement 

n <— n + 5r t , (1 < i < n) 

where 

Sri = Wk h Xk ^ju k . (1.4) 

The shape function Wk is chosen so that Wk(0) = 
1, while w k (() = for all |C| > C, where C = 
(Ci,C2,C3), ICI = max(|Ci|, |C 2 |, IC3 1) and C is a 
small integer (often C = 1). Hence Sri = for 
\r% — Xk\ > Ch; i.e., only atoms at distance O(h) 
from the center are actually moved. The shape 
function can often be chosen independently of 
fc; a typical shape is the "pyramid" Wfc(C) = 1 — 
|(| . The displacement described by it affects only 
atoms occupying a,2hx2hx2h cube; it will leave 
all of them within that cube as long as |iifc| = 
max(|ufci|, |u fe2 |, |u fc3 | < h. 

A scale h relaxation step is performed at a 
point Xk by choosing Uk so as to reduce E(r + Sr) 
as far as possible (or as far as convenient and 
practical to inexpensively calculate. Since this is 
only one step in an iterative process, it would 
often be a major waste of effort to calculate that 
life which actually minimizes E(r + Sr).) A re- 
laxation sweep on scale h is a sequence of such 
steps, with Xk scanning the gridpoints x±, X2, ■ ■ ■ 
of a grid (lattice) with meshsize h placed over 
the domain occupied by the atoms. 

What scale h should be chosen for the move- 
ments? For movements on a small scale h, com- 
parable to the inter-atomic distances, a slow- 
down similar to that of the particle-by-particle 
minimization will take place. Indeed, to reduce 
the energy, \iik\ must be smaller (usually sub- 
stantially smaller) than h. Large values of h, ap- 
proaching the linear size of the domain, will allow 
large scale movements, but will fail to perform ef- 
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ficiently intermediate-scale movements. Such in- 
termediate scale movements are usually neces- 
sary since there is no cheap way to choose shape 
functions that will exactly fit the required large 
scale movement (which of course depends on 
the external field /). Relaxation sweeps are thus 
needed at scales approximating all scales of the 
problem. 

A multiscale relaxation cycle is a process which 
includes particle-by-particle relaxation sweeps 
plus relaxation sweeps on scales a, 2a, 4a, . . . , 2 l a, 
where a is comparable to the inter-atomic dis- 
tance and 2 l a is comparable to the radius of the 
domain. The 1: 2 meshsize ratio of such a cycle 
is tight enough to allow efficient generation of 
all smooth movements of the atoms, even with 
a fixed and simple (e.g., pyramidal) shape func- 
tion. Such a cycle will not slow down. If every 
cycle involves a couple of relaxation sweeps at 
every level, the error (or the residual forces) will 
typically drop by an order of magnitude per cy- 
cle! 

A comment on names: What we call here 
"multiscale relaxation" is equivalent to a process 
which is called "multigrid" by some recent au- 
thors (see Sec. 5.1), but which has actually been 
named "unigrid" in the traditional multigrid lit- 
erature jl6) , @ , , since all its moves are still 
being explicitly performed in terms of the finest 
level (here: the level of moving single atoms). The 
term "multigrid" traditionally implies some ad- 
ditional important ideas, to be discussed next. 

1.4- Displacement fields 

Instead of performing one displacement at a 
time on the ensemble of particles, it will be more 
efficient (and important in other ways that will 
be explained in Sec. 5.1 below) to regard the set 
of displacement amplitudes u\, u^, ■ ■ ■ (at respec- 
tive centers X-^ ■ X 2 , . . . forming a grid with mesh- 



size h) as one field, to be jointly calculated. In- 
stead of (1.4), the field Sr of particle moves will 
then be given by 

k ^ ' 

Note the superscripts h added here to emphasize 
that x\, u\ and are in principle different on 
different grids h, although the shapes w% will 
often be independent of both k and h. The choice 
of wi will normally be such that 

E^(^) =1 for any n, (1.6) 

so that (1.5) is simply an interpolation of the dis- 
placement field u h — (ui,U2, ■ ■ •) from the grid- 
points to the (old) particle locations. 
For example with the choice 

^(0 = (l-|Ci|)(l-|C2|)(l-|Ca|), (1-7) 

relation (1.5) expresses the usual tri-linear in- 
terpolation: a linear interpolation in each of the 
three coordinate directions, performed succes- 
sively in any order. This is actually a second or- 
der interpolation; we generally say that the in- 
terpolation is of order p if, for any sufficiently 
smooth function U{x), 

It is easy to see that (1.6) is necessary and suf- 
ficient for the interpolation order to be at least 
1. 

Note that the pyramidal shape tu£(£) = 1 — |C| 
will not yield an interpolation, and is therefore 
inadequate here. For example, for a constant dis- 
placement field (ti^ independent of k) it will give 
sharply variable atomic moves. 

For any fixed particle configuration r, the en- 
ergy E(r + Sr) is, by (1.5), a functional of the 
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displacement field u h ; we denote this functional 
E h : 

E h {u h ) = E(r + Sr). 

In principle one needs to perform the interpola- 
tion (1.5) in order to evaluate E h (u h ). Usually, 
however, simpler and more explicit evaluation, or 
approximate evaluation, procedures can be con- 
structed. This is especially possible if h is small, 
comparable to the distances between neighbor- 
ing atoms (h = a, the finest scale in Sec. 1.3), 
because on such a scale one can assume u h to 
be smooth, since non-smooth motions have al- 
ready been efficiently performed at the parti- 
cle level (by the particle-by-particle relaxation 
sweeps; these motions are the "fast local adjust- 
ments" mentioned in Sec. 1.2). For such smooth 
u h the "strains", i.e., the differences u\ — be- 
tween the displacements at neighboring sites x\ 
and xf, are small. Since the change 8ri — 5r 3 
in the relative position of a pair of neighboring 
atoms i and j can, by (1.5) and (1.6), be written 
as a linear combination of these small strains, 
the change in their potential 

SVij = Vij(\n + S n - [r 3 + 5r 3 )\) - V i3 (\n - r 3 \) 

can be expanded in a Taylor series in terms of 
the small strains: 

M 
m— 1 k.l 

where the summation J2ki ' is over s ^ es x k and xi 
in the vicinity of ri and r 3 . The degree M of the 
expansion depends on the desired accuracy. Since 
the displacement will be part of a self-correcting 
iterative process, the minimal order M = 2 will 
usually suffice; higher orders would often be just 
a waste of effort. The coefficients A i3 ki m depend 
of course on the base configuration r. 

From (1.3), (1.1) and (1.9), or by some other 
approximation, one obtains E(r + Sr) as an ex- 
plicit functional, E h (u h ), of the displacement 



field. Hence, before ever returning to the parti- 
cles themselves, one can (approximately) calcu- 
late the displacement field which would reduce 
the energy as far as possible, i.e., the values u h * 
for which 

E h (u h *) = mm E h {u h ). (1.10) 

u h 

The fast (approximate) solution of (1.10) (i.e., 
finding u h *) is a lattice problem, to be discussed 
in the next section. As we will see, the process 
will include, even though indirectly, displace- 
ments on all coarser scales 2h, Ah, .... 

Having calculated a field u h which approx- 
imates u h * , one can then return to the par- 
ticles and displace all of them simultaneously, 
using (1.5). This collective motion would in- 
troduce accurately the main large-scale smooth 
change needed to approach the ground state 
r*. The remaining error would usually be non- 
smooth, hence quickly removable by a couple of 
additional particle-by-particle relaxation sweeps. 
When such sweeps start to indicate slow con- 
vergence (e.g., slow reduction of residual forces), 
the small remaining error is again smooth, hence 
it can again be substantially reduced by a new 
displacement field u , (approximately) minimiz- 
ing a new energy E h (u h ), similar to the above, 
but constructed with respect to the new (the 
latest) configuration r. And so on. Each such 
cycle, including a number of particle-by-particle 
relaxation sweeps plus forming (1.9), solving Eq. 
(1.10) and moving according to (1.5), will typi- 
cally reduce the error by an order of magnitude. 

1.5. Lattice minimization problem 

Instead of the particle problem (1.2)-(1.3), 
consider now the analogous lattice problem of 
finding the configuration u a * which minimizes 
the energy 
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E a (u a ) = Vijiuf - u?) - £ (!■") 

where each is a (scalar or vector) variable lo- 
cated at a gridpoint xf of a d-dimensional grid 
with meshsize a. The lattice energy E a (u a ) can 
arise either as the discretization of a continuum 
field energy, or as the particle displacement en- 
ergy described above ((1.10), with h = a). (For 
more comments on the relation between particle 
and continuum problems see Sec. 6 below.) 

Methods for minimizing (1.11) arc fully anal- 
ogous to those described above for the particle 
problem. The basic step is to change one variable 
uf so as to reduce E a (u a ), all other variables 
being held fixed. Repeating such a step at ev- 
ery gridpoint xf is called a point-by-point relax- 
ation sweep. Such a sweep may initially rapidly 
reduce the residuals dE a /duf, but as soon as 
the error u a * — u a becomes smooth, the conver- 
gence will become very slow. In fact, no local 
processing can efficiently reduce smooth errors, 
because their size depends on more global infor- 
mation (e.g., on the residuals over a domain sub- 
stantially larger than the domain over which the 
residuals have one dominant sign, or one domi- 
nant direction). Thus, steps of more global na- 
ture are required. 

Relaxation on scale h can be devised here in 
the same manner as above, based on moves 

5u? = w k (£^y k , (1.12) 

with the same shape function Wk(() as used in 
(1.4). A relaxation sweep on scale h is repeat- 
ing (1.12) with Xk scanning all the gridpoints 
of a grid with meshsize h. A multiscale relax- 
ation cycle is a process that typically includes a 
couple of relaxation sweeps on each of the scales 
a, 2a, . . . , 2 l a, where a relaxation on the scale a 
is just the point-by-point relaxation mentioned 
before, and 2 l a is a meshsize comparable to the 



linear size of the entire domain. (Usually, taking 
every other hypcrplane of a grid h will give the 
hypcrplanes of the grid 2h. In case of rectangu- 
lar or periodic domains, the finest meshsize a is 
customarily chosen so that the original grid has 
a multiple of 2" intervals in each direction, to 
maintain simplicity at the coarser levels.) Each 
such cycle will typically reduce the error by an 
order of magnitude. 

1.6. Multigrid cycles 

Instead of performing all the moves directly in 
terms of the finest grid, as in (1.12), it will usu- 
ally be more efficient and advantageous (see Sec. 
5.1) to consider the moves u h = (itf, u\,. . . , u h ) 
for grid h (at its gridpoints x\,x\,. . . ,x^ h , re- 
spectively) as a field which jointly describes dis- 
placements for the next finer field, it' 1 / 2 , via the 
relation 




(1.13) 



where the shape functions satisfy (1.6) (or even 
(1.8), for some chosen order p), so that (1.13) is 
in fact a proper interpolation (of order p). We 
will denote this interpolation from grid h to grid 
h/2 by 

§u h/2 = (LU) 

Each field u h will be governed by its own energy 
functional E h (u ), constructed from (1.14) and 
the relation 

E h (u h ) w E h ' 2 (u h ' 2 + 5u h ' 2 ), (1.15) 

approximations (wherever needed to obtain a 
simple explicit dependence of E h on u h ) being 
derived in the same manner as in Sec. 1.4 above. 
Note that E h is constructed with respect to a 
given, fixed configuration u h l 2 ; its coefficients are 
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re-derived each time the algorithm switches from 
moves on grid h/2 to moves on grid h. 

Indeed, this recursive structure of fields and 
energies is operated by a recursive algorithm. A 
multigrid cycle for grid h/2 is recursively defined 
as consisting of the following 5 steps: 

(i) Pre-relaxation. Perform v\ relaxation sweeps 

on grid h/2. 

(ii) Coarsening. With respect to the current con- 
figuration u h l 2 , construct the energy func- 
tional E h , using (1.15). 

(iii) Recursion. If h is the coarsest grid, mini- 
mize E h (u h ) directly: the number of vari- 
ables is so small that this should be easy to 
do. Else, perform 7 multigrid cycles for grid 
h, starting with the trivial initial approxi- 
mation u h = 0. 

(iv) Uncoarsening . Replace u h l 2 by u h / 2 + 5u h / 2 , 
using (1.13) with the final configuration u h 
obtained by Step (iii). 

(v) Post-relaxation. Perform v 2 additional relax- 

ation sweeps on grid h/2. 

The parameter 7 is called the cycle index; in 
minimization problems it is usually 1 or 2. (Much 
larger indices may be used in Monte-Carlo pro- 
cesses; see Sec. 5.4 below). When 7 = 1 the cycle 
is called a V cycle — or V(v\, 1^), showing the 
number of pre- and post-relaxation sweeps. For 
7 = 2 the cycle is called a W cycle, or W(vi, v<i). 
Fig. 1 displays the graphic origin of these names. 

A multigrid solver can consist of a sequence of 
multigrid cycles for the finest grid a. Each multi- 
grid cycle, with v\ + 1/2 = 2 or 3, will typically 
reduce the error by an order of magnitude. Since 
the work on each level h is only a fraction (j2~ d ) 
of the work on the next finer level (h/2), most of 
the work in a cycle are the v\ + v 2 point-by-point 
relaxation sweeps performed at the finest grid a. 
(An improved multigrid solver — the FMG al- 
gorithm — is described in Sec. 3.1 below). 



We have seen here an example of a fast multi- 
grid solver. As we will see below, multigrid-like 
structures and algorithms can serve many other 
computational tasks. 

2. Introduction 

Multiscalc (or "multilevel" or "multigrid" ) meth- 
ods are techniques for the fast execution of var- 
ious many-variable computational tasks defined 
in the physical space (or space-time, or any other 
similar space). Such tasks include the solution 
of many-unknown equations (e.g., discretized 
partial differential and integral equations), the 
minimization or statistical or dynamical simula- 
tions of many-particle or large-lattice systems, 
the calculation of determinants, the derivation 
of macroscopic equations from microscopic laws, 
and a variety of other tasks. The multiscalc al- 
gorithm includes local processing (relaxing an 
equation, or locally reducing the energy, or sim- 
ulating a local statistical relation) at each scale 
of the problem together with inter-scale inter- 
actions: typically, the evolving solution on each 
scale recursively dictates the equations (or the 
Hamiltonian) on coarser scales and modifies the 
solution (or configuration) on finer scales. In this 
way large-scale changes are effectively performed 
on coarse grids based on information gathered 
from finer grids. 

As a result of such multilevel interactions, the 
fine scales of the problem can be employed very 
sparingly, and sometimes only at special and/or 
representative small regions. Moreover, the inter- 
scale interactions can eliminate all kinds of trou- 
bles, such as: slow convergence (in minimiza- 
tion processes, PDE solvers, etc.); critical slow- 
ing down (in statistical physics); ill-posedness 
(e.g., of inverse problems); large-scale attraction 
basins (in global optimization); conflicts between 
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Fig. 1. Multigrid cycles. A full circle • stands for v\ relaxation sweeps (pre-relaxation) on the level shown. Empty circle 
o indicates U2 sweep (post-relaxation). A downward arrow \ means coarsening (calculation of the Hamiltonian E h from 
E h / 2 ). An upward arrow / stands for uncoarsening (displacing the field u h / 2 by the interpolated u h ). 



small-scale and large-scale representations (e.g., 
in wave problems); numerousness of interactions 
(in many body problems or integral equations); 
the need to produce many fine-level solutions 
(e.g., in optimal control) or very many fine-level 
independent samples (in statistical physics); etc. 

The first multiscale algorithm was probably 
Southwell's two-level "group relaxation" for solv- 
ing elliptic partial differential equations J7l[ | , first 
extended to more levels by Fedorenko |?5| . These 
and other early algorithms did not attract users 
because they lacked understanding of the very lo- 
cal role to which the fine-scale processing should 
be limited and of the real efficiency that can 
be attained by multigrid and how to obtain it 
(e.g., at what meshsize ratio). The first multigrid 
solvers exhibiting the generality and the typi- 



cal modern efficiency (e.g., four orders of mag- 
nitude faster than Fedorenko's estimates) were 
developed in the early 1970's f§, leading 
to extensive activity in this field. Much of this 
activity is reported in the multigrid books p9[ , 

fg, 0, ii, m, ©, (§, f§, @, n, s 

pl[ , Q and references therein. Recent develop- 
ments, including in particular the development 
of multiscale methods outside the field of partial 
differential equations, are reviewed in p^ j, |Q. 

This article will review some of the basic con- 
ceptual developments, with special emphasis on 
those relevant to lattice field calculations. Sec. 3 
will deal with the most developed area, that of 
multigrid solvers for discretized partial differen- 
tial equations; the solvers for the special case of 
Dirac equation are discussed in more detail in 
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Sec. 4, including new approaches appearing here 
for the first time (Sec. 4.7). 

In most problem environments, the multi-level 
approach can give much more than just a fast 
solver. For example, it yields very fast procedures 
for updating the solution upon local changes in 
the data: see Sec. 4.8. For LGT problems it can 
also provide very economic and rapidly updat- 
able data structures for storing inverse matrix 
information which allows immediate updating of 
various required quantities, such as determinant 
values needed for the fermionic interaction: see 
Sec. 4.9. 

In statistical field computations, in addition to 
such fast calculation with the fermionic matrix, 
multiscale processes can potentially contribute in 
the following three ways (first outlined in p2[|). 
Firstly, in the same way that they accelerate 
minimization processes (see Sec. 1 above), they 
can eliminate the "critical slowing down" in sta- 
tistical simulations: see Sees. 5.2-5.3. Secondly, 
they can avoid the need to produce many inde- 
pendent fine-level configurations by making most 
of the sampling measurements at the coarse lev- 
els of the algorithm: see Sees. 5.4-5.7. Thirdly, 
they can be used to derive larger-scale equations 
of the model, thereby avoiding the need to cover 
large domains by fine grids, eventually yielding 
macroscopic equations for the model: see Sec. 6. 

This potential has so far been realized only for 
very simple model problems. Extensions to more 
complex problems are not sure, and certainly not 
straightforward. But one may note here that a 
similar situation prevailed two decades ago in 
multigrid PDE solvers. It took years of system- 
atic research, indeed still going on today (and 
partly reported herein) , to extend the full model- 
problem efficiency to complicated real life prob- 
lems. 



3. Multigrid PDE Solvers 

The classical multi-scale method is the multi- 
grid solver for discretized partial differential equa- 
tions (PDEs). The standard (or "textbook") 
multigrid efficiency is to solve the system of dis- 
cretized equations in few (less than 10) "minimal 
work units' 1 '' , each equivalent to the amount of 
computational work (operations) involved in de- 
scribing the discrete equations; e.g., if the system 
is linear and its discretized version is described 
by a matrix, this unit would be the work involved 
in one matrix multiplication. (Note that even 
in case of dense matrices, as in integral equa- 
tions, multigrid methods allow an n x n matrix 
multiplication to be performed in only O(n) or 
0(n log n) operations; see Sec. 6.) 

Moreover, the multigrid algorithm can use 
a very high degree of parallel processing: with 
enough processors it can, in principle, solve a 
system of n (discretized PDE) equations in only 
0((logn) 2 ) steps. Note, for example, that each 
stage of the multigrid cycles described above 
can be performed at all gridpoints in parallel; 
only the stages themselves are sequential to each 
other. 

The history of multigrid solvers is marked 
by systematic development, which is gradually 
achieving the full standard efficiency stated above 
for increasingly difficult and complex situations. 
Some highlights of this development are pre- 
sented below. For more details see the basic guide 
||f7f , with some general updates in [|l2| , [jl4| , and 
more specific references given below. We will em- 
phasize general features that are relevant to the 
Dirac equations, but will postpone a specific dis- 
cussion of the latter to Sec. 4. 
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3.1. Scalar linear elliptic equations 

An elliptic differential equation is often equiv- 
alent to a minimization problem. For example, 
in d dimensions (x = (xi, . . . , x d ), = d/dx^) 
the diffusion equation 

d 

d^(a(x)d^U(x)) = f(x), a(x) > (3.1) 

(with suitable boundary conditions) is equivalent 
to finding the function U which minimizes the 
energy 



E(u) = I a(x) y^^(d^u(x)) 2 dx 



(3.2) 



among all functions u for which this integral 
is well defined (and which also satisfy suitable 
boundary conditions). The discretization of (3.1) 
on a grid with meshsize a can also be formulated 
as a minimization of a functional E a (u a ). The 
multigrid V cycle described above (Sec. 1.6) is a 
very efficient solver for such a problem. 

Instead of the formulation in terms of a mini- 
mization problem, the same algorithm will now 
be written in terms of the PDE and its discretiza- 
tion. This will enable us later to generalize it. Let 
the given PDE, such as (3.1), be generally writ- 
ten as 



Lu = f. 



(3.3) 



Let the discrete solution at gridpoint x\ of a 
given grid with meshsize h be denoted by U^. 
The grid-ft, approximation to (3.3) can be writ- 
ten as 



L h U h 



(3.4) 



where L h is an n h x n h symmetric and positive 
definite matrix. Starting with any initial approx- 
imation u h , a multigrid cycle for solving (3.4) is 
recursively defined as the following 5 steps for 



improving the approximation. (Note that h here 
corresponds to h/2 in Sec. 1.6.) 

(i) Pre- relaxation. Perform v\ relaxation sweeps 
on grid h. In each relaxation sweep the grid- 
points x\ , x\ , . . . , x'^ h are scanned one by one. 
For each gridpoint x\ in its turn, the value 
(the current approximation to l/f 1 ) is replaced 
by a new value for which equation (3.4) is satis- 
fied (holding u'j, for all j ^ i, fixed). This relax- 
ation method is called Gauss-Seidel relaxation, 
and is exactly equivalent to the point-by-point 
minimization (cf. Sec. 1.5) of the energy 



E h (u h ) = (u h ,L h u h ). 



(3.5) 



(ii) Coarsening. The equations for the next 
coarser grid, grid 2h, 



L 2h U 2h = R 



21, 



(3.6) 



should express the requirement that 

E 2h (U 2h )=minE 2h (u 2h ), 

where, similar to (1.15), 

E 2h ( u 2h ) = E h (u h + I% h u 2h ). 

Here E h is given by (3.5), u h is the current solu- 
tion on grid h (the final result of Step (i)), and 
1% is the interpolation from grid 2h to grid h, 
defined as in (1.13)-(1.14), by 



(iLkk = w 2 



21, 



X* — X 

2h 



21, 



(3.7) 



It is easy to see that this requirement is equiva- 
lent to defining 



R 2h = fihyh_ L h v h} 
and 



J^2h j2h jji jh 



where 



T 2h 

1 h 



2- d (4j T , 



(3.8) 



(3.9) 



(3.10) 
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superscript T denoting the adjoint operator (trans- 
position). The factor 2~ d turns into an av- 
eraging operator, so (3.8) means that R 2h is ob- 
tained by local averaging of the current residual 
field R h = f h - L h u h . Also, with this factor, 
L 2h , defined by (3.9), is a grid-2/i approximation 
to L; hence (3.6) is nothing but a grid-2/i approx- 
imation to the error equation L h e h — R h , where 
e h = U h — u h is indeed the error which U 2h is 
designed to approximate. In fact, instead of (3.8) 
any other averaging of R h will do, and instead 
of (3.9) a simpler (and much cheaper computa- 
tionally) L 2h can usually be used, derived, e.g., 
by direct discretization of L on grid 2h. 

(iii) Recursion. If 2h is already the coarsest 
grid, solve (3.6) directly (or iteratively; in any 
case this should be cheap, since the number of 
unknowns is very small). Otherwise, perform 7 
multigrid cycles for solving (3.6), starting with 
the trivial approximation u 2h = 0. 

(iv) Uncoarsening . Interpolate u 2h to grid h 
and add it as a correction to u h ; that is, 



u h + l£ h u 



21: 



(3.11) 



(v) Post-relaxation. Perform v% additional re- 
laxation sweeps on grid h. ■ 

The multigrid cycles thus defined, for cycle in- 
dices 7=1 and 7 = 2, are displayed in Fig. 1 
above. 

The same cycles can be used even in case the 
elliptic problem is not equivalent to a minimiza- 
tion problem. Correspondingly, there is much 
freedom in selecting the algorithm components 
(relaxation, inter-grid transfers I^u and J? , and 
the coarse grid operator L 2h ) and in treating 
boundaries (see Sec. 3.2 below). 

In fact, the efficiency of the multigrid cycle 
(e.g., its asymptotic convergence factor) can ex- 
actly be predicted in advance by local mode 
{Fourier) analysis |l3| ; so exactly indeed that 
the prediction can be used in algorithmic design 



(choosing optimal relaxation, inter-grid trans- 
fers, etc.) and program debugging. In particular, 
it yields general rules (summarized in Sec. 3.4 be- 
low) for the required orders of interpolation 1^ 
and restriction J? , and a general, easily com- 
putable yardstick for measuring the efficiency of 
the interior (away from boundaries) relaxation, 
called the smoothing factor of relaxation, ~p fusf . 
This 71 is defined as the factor by which each 
sweep of relaxation reduces the high-frequency 
components of the error, i.e., those components 
that cannot be reduced by the coarse-grid correc- 
tions. With proper choice of the inter-grid trans- 
fer, the error reduction factor per multigrid cycle 
should approach ~p Ul+ " 2 . 

For uniformly elliptic equations of second or- 
der, Gauss-Seidel (GS) relaxation often yields 
the best smoothing factor per operation per 
point. It is usually most effective when done in 
Red-Black (RB) ordering: first the "red" points 
(say those x^ = {x h }1 x^ d ) with even x j a /h) 
are relaxed, then the "black" ones (with odd 
J2 a xl ja/h)- In the case of the Poisson equa- 
tion ((3.1) with constant a(x)) in two dimension 
(d = 2), for example, RB-GS relaxation costs 
only 4 additions per point and yields 71 = .25 
. Relaxation of higher order uniformly elliptic 
scalar equations, such as the biharmonics equa- 
tion, can attain a similar efficiency by writing 
them as a system of second order equations, each 
relaxed by RB-GS. When the uniform ellipticity 
deteriorates, other relaxation schemes should be 
employed: see Sees. 3.6 and 3.7 below, and Sec. 3 



in (17 



How many cycles are needed to solve the prob- 
lem (3.4)? This depends on the quality of the 
initial approximation, u h0 , and on the required 
size of the final error || u h — U h ||. In particular, 
if the following algorithm is used, only one cycle 
will often do. 
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The full-multigrid algorithm with n cycles per 
level, or briefly the n-FMG Algorithm for level h 
is recursively defined as follows. If h is the coars- 
est grid, solve (3.4) directly. Otherwise, apply 
first the n-FMG algorithm to the corresponding 
equation on grid 2h 

L 2hTj2h = j2h = jVifh^ (3 12 ) 

obtaining for it an approximate solution u 2h . 
Then interpolate the latter to the fine grid 

u h0 =I% h u 2h , (3.13) 

and perform n multigrid cycles for level h to im- 
prove this initial approximation, yielding the fi- 
nal approximation u . 

An example of the entire flow of the 1-FMG 
algorithm, employing one V cycle at each level, 
is shown in Fig. 2. Such an algorithm (with v\ + 
V2 = 2 or 3) is usually enough to reduce the error 
below discretization error, i.e., to yield 

|| u h -U h || < C || U h -U ||, (3.14) 

(with C = .5, say), provided the problem is uni- 
formly elliptic, and the proper relaxation scheme 
and inter-grid transfers are used. In particular, 
the required order of the solution interpolation 
(3.13) (unlike the order of the correction inter- 
polation (3.11)) depends on the norm || • || for 
which one wants (3.14) to be satisfied: see Sec. 
3.4. 

One can of course append additional n* cycles 
to the 1-FMG algorithm, thus satisfying (3.14) 
with much smaller C; typically C w 10~ n *. This 
may be wasteful: it will bring u h closer to U h , 
but not closer to the differential solution U. 

3.2. Various boundaries and boundary 
conditions 

Along with the interior equations (approxi- 
mating the PDE), the discretized boundary con- 



ditions should also be relaxed, and their remain- 
ing residuals should be averaged and transfered 
to serve as the forcing terms of the boundary 
conditions on the next coarser grids. 

Special attention should be paid to the inte- 
rior equations in the vicinity of the boundary. 
The error-smoothing effect of relaxation is dis- 
rupted there in various ways, and the correct rep- 
resentation of the near-boundary residuals on the 
coarser grid is generally more complicated than 
(3.8) and depends on the shape of the boundary 
and the type of boundary conditions. 

A general way around these difficulties is to 
add extra relaxation steps near the boundary, 
especially near re-entrant corners and other sin- 
gularities. With the suitable relaxation scheme 
this can reduce the near-boundary interior resid- 
uals so much that their correct representation on 
the coarser grid is no longer important. 

The work added by such near-boundary extra 
relaxation steps is negligible compared to that of 
the full relaxation sweeps. It can be proved that 
with such steps the efficiency of the multigrid cy- 
cle becomes independent of the boundary shape 
and the type and data of the boundary condi- 
tions p3| . This has been shown (experimentally) 
to be true even in particularly difficult bound- 
ary situations, such as: highly oscillatory bound- 
ary curves and/or boundary data and/or bound- 
ary operators (with the oscillation wavelength 
comparable to the meshsize) p^l ; free bound- 
aries |34j; "thin" domains (much thinner than 
the meshsize of some of the employed coarser 
grids) (62); domains with small holes (see Sec. 
3.9 below); etc. 

3.3. FAS: nonlinear equations, adaptive 
resolution 

A very useful modification to the multigrid 
cycle is the Full Approximation Scheme (FAS 
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Fig. 2. FMG Algorithm with 4 levels and one V cycle per level. A crossed circle © stands for a direct solution of equation 
on a coarsest grid. A double upward arrow indicates interpolation of solution to a new level. All other notations are 
the same as in Fig. 1. 



p6[), in which the coarse grid equations (3.6) 
are rewritten in terms of the "full approximation 
function" 

U 2h =7tu h + U 2 \ (3.15) 

where l\ u h is the representation on the coarse 
grid of u , the current fine grid approximation. 
This yields the coarse-grid equation 

L 2h U 2h = f h = L 2h Qfu h ) + R 2h . (3.16) 

In the uncoarsening step, u 2h in (3.11) should of 
course be replaced by u 2h — I h u h , where u 2h is 
the approximate solution to (3.16) obtained in 
the recursion step. 

One advantage of the FAS is that it can di- 
rectly be applied to nonlinear L h , without any 
linearizations: the same simple L 2h can serve in 
(3.16) as in (3.12) (whereas in (3.6) it could not, 
unless the equations are linear). Since L h , L 2h , . . . 
are all similar, unified programming for all levels 
is facilitated. 

The FAS-FMG algorithm solves nonlinear prob- 
lems as fast as linear ones, namely, in less than 
10 minimal work units. No Newton-Raphson it- 
erations are required. Also, solution tracing pro- 
cesses (embedding, continuation, searches in bi- 
furcation diagrams, etc.), needed in case of severe 



nonlinearities, can be performed very cheaply, 
by procedures which employ the finest grid very 
rarely. For example, continuation processes can 
often be integrated into the FMG algorithm, at 
no extra cost; i.e., a problem parameter gradu- 
ally advances as the solver (cf. Fig. 2) proceeds 
to finer levels. 

A general advantage of the FAS is that av- 
erages of the full solution, not just corrections, 
are represented on all coarser grids (hence the 
name of the scheme). This allows for various 
advanced techniques which use finer grids very 
sparingly. For example, the fine grid may cover 
only part of the domain: outside that part / 
of (3.16) will simply be replaced by the origi- 
nal coarse grid right-hand side, f 2h of (3.12). 
One can use progressively finer grids confined 
to increasingly more specialized subdomains, ef- 
fectively producing better resolution only where 
needed. In this way an adaptive resolution is for- 
mulated in terms of uniform grids, facilitating, 
e.g., low-cost high-order discretizations as well 
as fast multigrid solvers. The grid adaptation it- 
self (i.e., deciding where to introduce the finer 
level) can be done with negligible extra work 
(no repeated solutions) by being integrated into 
the FMG algorithm (at the double-arrow stages 
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of Fig. 2). As a refinement criterion (indicating 
where a further refinement of the currently- finest 
grid h is needed) one can use the local size of 
17 -f 2h |, which measures the correction intro- 
duced by grid h to the grid-2/i equations. More- 
over, each of the local refinement grids may use 
its own local coordinate system, thus curving it- 
self to fit boundaries, fronts, discontinuities, etc. 
Since this curving is only local, it can be accom- 
plished by a trivial transformation, and it does 
not add substantial complexity in the bulk of the 
domain (in contrast to global transformation and 
grid generation techniques). See details in Sees. 
7-9 of [[(J, Sec. 9 of @, @ and a somewhat 
modified approach in j58|. 

3.4- Non-scalar PDE systems 

A system of q differential equations in q un- 
known functions is called non-scalar if q > f. 
General multigrid procedures have been devel- 
oped for solving (the discretized version of) such 
systems. The overall flow of the algorithm re- 
mains the same (Figs. 1 or 2). General rules 
were developed (in for deriving suitable re- 
laxation schemes and inter-grid transfers for any 
given system. 

The most important rules of inter-grid trans- 
fers are the following. (For more details see |l7| 
and [|l3|]. ) Let my denote the order of differen- 
tiation of the j-th unknown function in the i-th 
differential equation. Let m? denote the order of 
the correction interpolation 1%, applied in (3.11) 
to the j-th unknown function, and m, the order 
of the fine-to-coarse transfer I 2h applied in (3.8) 
to the residuals of the i-th equation. (If (3.10) 
is used then rrij = m % . The order of interpola- 
tion is defined at (1.8).) Let M- 7 be the order of 
the solution interpolation applied in (3.13) 
to the j-th function. Let p denote the order of 
discretization, i.e., || U h - U ||= 0(h p ). Finally, 
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let lj be the order of derivatives we want to cal- 
culate for the j-th function, i.e., the order of its 
derivatives entering into the norm || • || used in 
(3.14). Then, to guarantee the full possible effi- 
ciency of the multigrid cycle it is required that 

mi + m? > niij. (3-17) 

In the border case m t + m? = rriij the algorithm 
may sometimes still perform satisfactorily. To 
guarantee further that the minimal number of 
cycles is used it is also required that 

Mi>p + £j. (3.18) 

In the case of first order systems, such as 
Dirac equations, since = 1 it follows that 
m t = m? = 1, (i.j = X,...,q). These minimal 
orders are very convenient: they mean that in un- 
coarsening, each Suf h can simply be taken as the 
value of any neighboring ui' 2 , and in coarsening 
the residual Rg can be added to any neighbor- 
mg Rf! . 

The main tool developed (in |l^| ) for analyz- 
ing and discretizing general non-scalar schemes, 
and for deriving suitable relaxation schemes for 
them, is the principal determinant operator. To 
illustrate this tool and its uses, consider for ex- 
ample the Cauchy-Riemann system 

d 1 U 1 + d 2 U 2 = f 1 (3.19a) 

d 2 U 1 -d 1 U 2 = f 2 (3.19b) 

where d v U^ = dU p jdx v . It can be written in 
the matrix operator form 

LU = f, (3.20) 



where 




The principal determinant operator in this case 
is simply the determinant of L 
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detL = -d 2 -d 2 = -A, (3.21) 

i.e., the Laplacian. 

For more complicated nonlinear systems of 
nonuniform order, one has to include only the 
principal part of the determinant of the lin- 
earized operator. The principal part depends 
on the scale under consideration. At sufficiently 
small scales the principal part is simply the terms 
of det L with the highest-order derivatives. 

From (3.21) one can learn that, like the Laplace 
equation, the Cauchy-Riemann system is second- 
order elliptic and requires one boundary condi- 
tion (although there are two unknown functions). 

Any discretized version of (3.19) can analo- 
gously be written in the form 

L h U h = f h , (3.22) 




For example, if conventional (non-staggered) cen- 
tral differencing is used, then <9( l is defined, for 
any function ip, by 

d£<p(x 1 ,x 2 )= j K {ip(x 1 + h,x 2 ) , 3 23 « 

-<p(xi - h,x 2 )}, 

and similarly d 2 . In this case 

clctL' 1 = -{d^ f - (d' 2 1 ) 2 = -A 2h , (3.24) 

where A 2h is the usual 5-point discrete Lapla- 
cian, except that it is based on intervals 2h, 
twice the given meshsize. From this one can learn 
that this central discretization of the Cauchy- 
Riemann system suffers from the same troubles 
as A 2h , namely: 

(1) The given grid h is decomposed into 4 
disjoint subgrids, each with meshsize 2h. The 
equations on these subgrids are decoupled from 



each other locally, thus forming 4 decoupled sub- 
systems. Hence the discrete solution will have 
large errors in approximating solution derivatives 
(whenever the discrete derivative involves points 
from different subgrids). Note that in the case 
of the Cauchy-Riemann equations, each subsys- 
tem involves values of both C/ 1 '" and U 2,h , but 
at different locations. 

(2) The discretization is wasteful, since it ob- 
tains grid-2/i accuracy with gvid-h labor. 

(3) The usual multigrid cycle will run into the 
following difficulty. The error components slow 
to converge by relaxation may be smooth (i.e., 
locally nearly a constant) on each subgrid, but 
generally they must be highly oscillatory on the 
grid as a whole (the 4 local constants being un- 
related to each other). Such an oscillatory error 
cannot be approximated on a coarser grid. 

One way to deal with the latter trouble is to 
regard U h (or each U^'' 1 , in the Cauchy-Riemann 
case) as a vector of 4 different functions, one 
function on each subgrid, with a corresponding 
decomposition of the Laplacian (or the Cauchy- 
Riemann operator) into 4 operators, locally dis- 
connected from each other. With a similar de- 
compositions on coarser grids, the coarse-grid 
corrections to any subgrid on level h should only 
employ values from the corresponding subgrid on 
level 2h, and each fine grid residual should only 
be transferred to corresponding coarse grid equa- 
tions. With such decompositions the multigrid 
cycle will restore its usual efficiency. 

Another way to treat the same trouble is to 
employ the usual cycle, without decompositions, 
which would necessarily lead to slow asymptotic 
convergence, but to observe that the slow-to- 
converge components are those highly oscillatory 
components which are smooth on each subgrid. 
Such components have no counterpart in the dif- 
ferential solution, so they can simply be (nearly) 
eliminated by solution averaging p6[ . 
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Still another way, for dealing actually with all 
the troubles listed above, is to observe that any 
3 of the 4 subsystems are redundant, so they can 
simply be dropped, turning what we called 2h 
into the actual meshsize of the remaining grid. In 
case of the Cauchy-Riemann system this would 
give the staggered discretization shown in Fig. 3. 

The design of an efficient relaxation scheme 
for any q x q non-scalar operator L can always 
be reduced to the design of a scheme for each of 
the factors of the scalar operator det L . Namely, 
writing 

det L h = L'l ■■■L h s 

where each factor L'f is a first- or second-order 
operator. Once a relaxation scheme has been de- 
vised for each L'j, with a smoothing factor ~pj, 
these s schemes can be composed into a relax- 
ation scheme for the original system L , with a 
smoothing factor Jl — maxj ~pj . Sec |l7j for de- 
tails. In particular, one can relax the Cauchy- 
Riemann system, and many other elliptic sys- 
tems for which detL' 1 = ±(A 2 — m 2 ) s , as effi- 
ciently as relaxing the Laplacian, obtaining Jl = 
.25 (cf. Sec. 3.1). 

In the case of the Cauchy-Riemann and similar 
first-order systems, a simpler (but less general) 
description of the same relaxation is that it is a 
Kaczmarz relaxation scheme. 

Kaczmarz relaxation @, jy], Q, for a 
general linear system of real or complex equa- 
tions 

n 

^dijUj = fi, (i = l,...,n), 

3=1 

is defined as a sequence of steps, each one relax- 
ing one of the equations. Given an approximate 
solution u = (iti, . . . , u n ), a Kaczmarz relaxation 
step for the i-th equation is defined as the re- 
placement of u by the vector closest to it on the 
hyperplane of solutions to the i-th equation. This 



means that each Uk is replaced by u k + fiia* k , 
where a* k is the complex conjugate of djfc and 

, n \ n 

A = ( fi - ^2 a% i u i ) / H l a « I 2 ' 
^ i=i ' 3=1 

Kaczmarz relaxation always converges to a so- 
lution, if one exists, but the speed of conver- 
gence may be extremely slow. For scalar elliptic 
systems its smoothing factors are much poorer 
than those obtained by Gauss-Seidel. But for 
first order non-scalar systems it can attain opti- 
mal smoothing factors. Kaczmarz relaxation for 
Cauchy-Riemann equations in Red-Black order- 
ing for each of the two equations (which could be 
derived, as mentioned above, from RB-GS for the 
Laplacian), has the smoothing factor Jl = .25. In 
usual ordering (row by row or column by column) 
the smoothing factor is only Jl = .5. 

3.5. Discontinuous and disordered coefficients. 
AMG 

In the case of discontinuous or very non- 
smooth data, the multigrid algorithm requires 
certain modifications to retain its full above- 
stated "textbook" efficiency. 

The discontinuity of the external field f h (cf. 
(3.4)) does not affect the efficiency of the multi- 
grid cycle. It only requires some attention in 
the FMG algorithm: the discrete coarse-grid field 
f 2h (cf. (3.12)) should be formed by a full aver- 
aging of f h ; e.g., f 2h = I 2h f h with If 1 of the 
type (3.10). Also, when f h has a strong singu- 
larity (e.g., it is a delta function, representing a 
source) , some special relaxation passes should be 
done over a small neighborhood of that singular- 
ity immediately following the solution interpola- 
tion to a new level (the double arrow in Fig. 2). 

More difficult is the case of discontinuities in 
the principal terms; e.g., in the diffusion coeffi- 
cient a{x) (cf. (3.1) or (3.2)). Experiments re- 
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Fig. 3. Staggered discretization of the Cauchy-Riemann system. A and I show the location off/ 1 ' and U 2 ' h respectively. 
The discretizations of (3.19a) and (3.19b) are centered at grid vertices and plaquette centers respectively. 



vealed that the convergence factor per multi- 
grid cycle completely deteriorates when a(x) has 
strong discontinuities, in particular when a(x) 
discontinuously jumps from one size to an orders- 
of-magnitude different size. The main difficulty, 
it turns out, is to represent a(x) correctly on the 
coarser grids. The conductance (relatively large 
a(x)) and insulation (small a(x)) patterns can 
be too finely complicated to be representable on 
coarse scales. 

The first fairly successful approach to such 
strongly discontinuous problems 0j was to em- 
ploy the coarsening (3.6)-(3.10) with special 
shape functions w% h , such that, for each xf, 
the interpolation weight w 2 . ((x^ — x 2 ! l )/2h)j is 
proportional to a "properly averaged" (see be- 
low) value of the transmission a(x) between x\ 
and x\r . In fairly complicated cases this creates 
L 2h with quite faithful conductance-insulation 
patterns, yielding fully efficient cycles. This ap- 
proach led to very useful "black-box" solvers Q . 

Another possible approach is to derive L 2h not 
by (3.9) but as a direct discretization of L, with 
properly averaged values of a(x). "Proper aver- 
aging", as is well known in electrical networks, 
means harmonic averaging (i.e., arithmetic aver- 



aging of the resistance a(x)~ 1 ) in the transmis- 
sion direction and arithmetic averaging in the 
perpendicular direction. 

Both these approaches will fail in cases of 
conductance-insulation geometry that cannot be 
approximated on a uniform coarse lattice. Coarse 
levels should then abandon the lattice structure, 
adapting their geometry to that of the problem. 

In "algebraic multigrid" (AMG) solvers, intro- 
duced in the early 1980's (§13.1 in fo|, JyJ, @, 
pH , [[72) , |67[ ) , no grids are used. Even the spa- 
tial geometry behind the given (the finest) al- 
gebraic system need not be given explicitly (al- 
though its implicit existence may be important 
for the sparsity of the coarser levels formed by 
the algorithm). The next-coarse-level variables 
are typically selected by the requirement that 
each current-fine-level variable is "strongly con- 
nected" to at least some coarse-level variables, 
the strength of coupling being determined by 
the fine- level equations (e.g., by relative local 
sizes of the discrete conductance) . The inter-level 
transfers may also be purely based on the alge- 
braic equations, although geometrical informa- 
tion may be helpful (see Q , |^57| ) . 

AMG solvers usually involve much (one to 
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two orders of magnitude) more computational 
work, and also much more storage than the reg- 
ular ("geometric") multigrid. Also, they have 
been successfully developed so far only for lim- 
ited classes of problems (mostly scalar, and hav- 
ing "local positive definiteness" pj]|). However, 
they are often convenient as black boxes, since 
they require no special attention to boundaries, 
anisotropies and strong discontinuities, and no 
well-organized grids (admitting, e.g., general- 
partition finite element discretizations). More 
important, AMG solvers are indispensable for 
disordered systems, such as diffusion problems 
with arbitrary conductance-insulation patterns, 
or problems not derived from a PDE at all, such 
as the geodetic problem (which motivated much 
of the original AMG work ^J), the random- 
resistor network (which served to introduce AMG 
to the world of physicists [Q), the Laplace equa- 
tion on random surfaces fjj], and many others. 

Also, AMG-type solvers can be developed for 
new classes of problems. See the discussion in 
Sec. 4.6 below. 

3.6. Anisotropic equations. Convection 
dominated flows 

Good ellipticity measures (e.g., domination of 
viscosity) at all scales of the problem is essential 
for the success of the multigrid algorithms de- 
scribed above, since ellipticity means that non- 
smooth solution components can be calculated 



by purely local processing. (See Sec. 2 of 17 for 
general definitions of ellipticity measures.) 

Small-ellipticity problems are marked either 
by indefmitencss — discussed in Sec. 3.7 below 
— or by anisotropies. In the latter case, charac- 
teristic directions (directions of strong coupling, 
or convection directions, etc.) exist in the equa- 
tions. Non-smooth solution components can be 
convected along the characteristics, hence they 



cannot be determined locally. Therefore, to still 
obtain the "textbook" efficiency stated above 
(beginning of Sec. 3), the multigrid algorithm 
must be modified. 

Fully efficient multigrid algorithms have been 
developed by using the characteristic directions 
in various ways; e.g., in devising the relaxation 
scheme. See Sec. 3.3 in JItJ for the case that the 
characteristics are aligned with gridlines, and ||, 
@, ||, fH for the non-aligned case. 

Most multigrid codes in use today for high- 
Reynolds (small viscosity) steady-state flows do 
not incorporate this type of modifications. There- 
fore, although yielding large improvements over 
previous one-grid solvers, they are very far from 
attaining the "textbook" multigrid efficiency. 

3. 7. Indefinite problems. Wave equations 

Indefinite problems arise as the spatial part 
of wave equations in acoustics, seismology, elec- 
tromagnetic waves and quantum mechanics. A 
model example is the real equation 



AU(x) + k(x) 2 U{x) = f{x). 



(3.25) 



Slight indefiniteness . If k(x) 2 is everywhere 
small, so that only few eigenvalues of (3.25) are 
positive, the algorithm does not change on fine 
levels (except when some eigenvalues come too 
close to 0; see Sec. 3.9). But at a certain coarse 
grid one can no longer solve fast by using still 
coarser grids. This grid is very coarse, though, 
since it only has to provide approximations to 
those (very smooth) eigenfunctions correspond- 
ing to the positive eigenvalues, hence one can 
efficiently solve there, e.g., by Gaussian elimina- 
tion or Kaczmarz relaxation (see end of Sec. 3.4) 

If k{x) 2 is generally small, still making only 
few eigenvalues positive, but it is large in some 
small regions (creating local indefiniteness), the 
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same algorithm applies, except that Kaczmarz 
relaxation should be used at those special re- 
gions on all grids. Since the Kaczmarz smoothing 
is poor, more relaxation passes should be made 
over those regions. 

High indefiniteness is the more difficult case 
when the wavelength \{x) — 2n/k(x) is generally 
small compared to the linear size of the problem 
domain. Oscillatory solution components with 
wavelength near X(x) are not determined locally. 
Hence, on scales at which such components are 
non-smooth — i.e., on grids with meshsize h 
comparable to X(x) — the multigrid solver must 
be radically modified. 

The modified approach, similar to that which 
has been developed for integral equations with 
oscillatory kernels ]l5| , is to represent the solu- 
tion on each level ft as a sum 

U h (x)^^2A h J {x)e lk{ - x ^- x 

where the are d-dimensional unit-length vec- 
tors, uniformly covering the unit sphere, their 
number being m h = 0(h 1 ~ d ). Thus, for increas- 
ingly coarser spatial grids, increasingly finer mo- 
mentum resolution (denser grids ^j 1 ) are used. 
In some version of this approach, on sufficiently 
coarse levels the equations will become similar to 
ray formulations (geometrical optics). 

This approach can yield not only fast solvers 
to discretized standing wave equations, such as 
(3.25), but also the option to treat most of the 
problem domain on coarse levels, hence essen- 
tially by geometrical optics, with nested local 
refinements (implemented as in Sec. 3.3 above) 
confined to small regions where the ray formu- 
lation breaks down. Also, the implementation of 
radiation boundary conditions in this approach 
is straightforward. 

Such representations can also form the basis 
for very efficient multigrid algorithms to calcu- 



late many eigenf unctions of a given elliptic oper- 
ator; e.g., the Schrodinger operator in condensed 
matter applications. 

3.8. Small-scale essential features 

A small-scale essential feature is a feature in 
the problem whose linear dimension is compa- 
rable to the meshsize h but whose influence on 
the solution is crucial. Examples: a small hole in 
the domain (an island), on whose boundary the 
solution is prescribed; a small piece of a bound- 
ary where a different type of boundary condition 
is given; a small but deep potential well or po- 
tential barrier in the Schrodinger equation; large 
"topological charge" over few plaquettes in Dirac 
equations; etc. When its linear size is too small, 
such an essential feature may become invisible to 
the next coarser grid 2 h, yielding wrong coarse- 
grid approximations to smooth errors. 

One general way around this difficulty is to 
enlarge the feature so that it becomes visible to 
grid 2h, then enlarge it again on transition to 
grid Ah, and so on. The enlarged feature should 
be defined (e.g., the depth of the enlarged poten- 
tial well should be chosen) so that its global effect 
remains approximately the same. Also, to make 
up for the local mismatch, the uncoarsening step 
(see Sec. 3.1) should be followed by special relax- 
ation steps in the vicinity of the small features 
on the fine grid. 

If there are many, or even just several, small- 
scale essential features in the problem, then on 
a certain coarse grid they become so crowded 
locally that they can no longer be further en- 
larged. On such a grid, however, a proper relax- 
ation scheme can usually provide a fast solver, 
without using still coarser grids at all. This ap- 
proach was successfully implemented in the case 
of small islands . 

Another general approach is to ignore the 
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small feature on grid 2h altogether. This would 
normally cause the multigrid algorithm to slow 
down, sometimes even to diverge. A closer look 
shows however that most solution components 
still converge fast; only few, say v, well-defined 
components do not. The slow to converge error 
can therefore be eliminated by recombining v + l 
iterants of the algorithm. This means replacing 
the configuration obtained at the end of each cy- 
cle by a linear combination of the configurations 
obtained at the end of the last v + 1 cycles, so as 
to obtain the least square norm of the residuals. 
Here again, local relaxation steps in the vicin- 
ity of the small feature should be added. This 
method has been tried for a variety of small-scale 
features and was found to restore the full "text- 
book" multigrid efficiency p3| . 

3.9. Nearly singular equations 

An alternate explanation for the efficiency of 
the usual multigrid cycle for an elliptic equa- 
tion on any grid h can be given in terms of 
eigenfunctions, as follows. An eigenfunction am- 
plitude fails to converge efficiently in relaxation 
only if the corresponding eigenvalue, X h , is in 
magnitude much smaller than other eigenvalues 
on grid h. Such an eigenfunction, however, is so 
smooth that it is well approximated on the coarse 
grid; that is, the corresponding eigenvalue on the 
coarse grid, X 2h , satisfies 



\X h -\ 2h \ < |A 



2// 



(3.26) 



which guarantees a small relative error in the 
coarse-grid approximation to the eigenfunction. 

We call an equation "nearly singular" for grid 
h if some of its smallest eigenvalues A are so small 
that the discretization error in A (on grid h or 
2h or both) is comparable to A, so that (3.26) 
fails. The troublesome eigenfunctions are called 
"almost zero modes" (AZMs). 



To obtain the usual multigrid efficiency such 
modes should be eliminated. This can be done 
by recombining iterants, as in Sec. 3.8 above. See 
also in (27), ||. 

3.10. Time dependent problems and inverse 
problems 

For parabolic time- dependent problems it has 
been shown that multigrid techniques are ex- 
tremely efficient not just in that they solve fast 
the system of implicit equations at each time 
step A large additional benefit is that only 
rare activation of fine scales is needed wher- 
ever the solution changes smoothly in time; e.g., 
wherever and whenever the forcing terms are 



stationary |37 . Also, multileveling allows paral- 
lel processing not only at each time step, but 
across the entire space-time domain. Extensions 
of such ideas to other time dependent problems, 
including high-Reynolds flows, are currently un- 
der study. 

Inverse problems can become well-posed when 
formulated in a multi-scale setting, and can be 
solved at a cost comparable to that of solv- 
ing corresponding direct problems J75|, A 
demonstration of this is being developed for sys- 
tem identification and inverse gravimetric prob- 
lems. 



4. Multigrid Dirac Solver 

A major part of lattice field calculations is 
the inversion of the discretized Dirac operator 
L h appearing in the fermionic action. This is 
needed both by itself and also as part of calcu- 
lations of the determinant of L h in case of inter- 
acting fermions (see Sec. 4.9). For this purpose, 
repeated solutions of systems of the type 
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L h^h = jh ^ 

is needed. Multigrid solvers for this type of equa- 
tions will be described here. In addition, the 
multigrid solver can save most of the work in 
repeatedly re-solving (4.1) for new gauge fields 
and forcing terms f h (see Sec. 4.8). Multigrid 
fast gauge fixing will also be described (Sec. 4.2). 

The matrix L h itself depends on the lattice-/i 
gauge field U h . Note that the unknown function 
in (4.1) is therefore denoted by ^ here (instead 
of U h in the previous chapter), and its computed 
approximations will be denoted ^p h (instead of 
u h ). Also, ^ h and 4> h here are complex, not real, 
functions. For simplicity we will omit the super- 
script h until we need to distinguish between sev- 
eral levels of the algorithm. 

The methods described here were developed 
at the Weizmann Institute in collaboration with 
others: see §, @, [(l3), Some of the reported 
ideas were developed in parallel by other groups 




4.1. Model case: 2-D QED 

The main difficulties in solving lattice Dirac 
equations arc already exhibited in the simple 
case of the Schwinger model |6^| (two dimen- 
sional QED). In staggered formulation [Q, on 
fine enough grid with meshsize h, equation (4.1) 
at gridpoint (J, k) has the form 

• "'./I'/./.- = fj,k, 

where m q is the quark mass and are the dis- 
crete "covariant derivatives" , defined by 

D ii>j,k = 2^{Uj + i /2>k il>j+i,k ~ ^-i/2,fc^-i,fe) 



Thus, the gauge field U — U h is defined on grid 
links. Each value of U is a complex number of 
magnitude 1, and U* is its complex conjugate 
(hence inverse); i.e., 

U t = e ihA *, u;=e- thAe , (4.3) 

where £ = (j + 1 /2, k) or (j, k + 1/2) and At, the 
gauge field phase per unit length, is real. Note the 
meshsize dependence introduced in (4.3). Cus- 
tomarily, the finest-grid (the given lattice) mesh- 
size is h = 1, but we do not confine our dis- 
cussion to this case since we will need coarser 
grids as well, and also since we will like to discuss 
the limit h — ► 0. (See the corresponding differen- 
tial equation, and an alternative discretization, 
in Sec. 4.7). 

Gauge fluctuations. We assume physically re- 
alistic gauge fields, as produced, e.g., by the 
quenched approximation |55|| . This means sta- 
tistical fluctuations of U according to the gauge 
action 

S G =0Z[1- cos(h 2 curl A 3+1/2 , k+l/2 )\ , (4.4) 

summation being over all plaquettes (j + 1/2, k + 
1/2) and the discrete curl operator being defined 
by 

CUrM J + 1 / 2 ,fc+l/2 = Jl(Aj + i/2,k + -4j+l,fc+l/2 
— A?'+l/2,fc+l — Aj, k+1/2)- 

This implies that at each plaquette h 2 curl A has 
nearly Gaussian distribution with mean and 
variance (3" 1 . 

Gauge freedom. The physical problem is un- 
changed (since so are (4.2) and (4.4)) by any 
" 'gauge transformation" of the form 

ipj,k <- ij}j,ke iBj - k , (4.5a) 
U j+ i/ 2 ,k «- U j+1/2ik e i ^+^- B ^ (4.5b) 
Uj, k +i/2 <- U^ k+1/2 e l(B ^- B ^ (4.5c) 
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done at all sites and links, with any real grid 
function Bj^. An arbitrary such transformation 
can always be done on the problem. 

Correlation lengths. Because of this freedom, 
the gauge configuration may seem completely 
disordered. However, it is easy to see that the 
integral of the field A around any domain of area 
£ 2 , thus containing £ 2 //i 2 plaquettes,has variance 
£ 2 /i _2 /3 -1 , hence the field U has the correlation 
length £ G = 0(fi x l 2 h) (cf. Sec. 4.2). Other im- 
portant lengths in the problem are the matter 
correlation length £ TO = 0(m~ 1 ) 1 the pion cor- 
relation length = 0(£,m 2 h 1 / 2 ) 1 and Lh, the 
linear size of the lattice. 

Boundary conditions. The true physical prob- 
lems are given in the entire space (or plane, in 
this case). The computations are done on a finite 
L x L grid with periodic boundary conditions, 
which may best approximate the unbounded do- 
main. The periodicity of U introduces some ar- 
tificial topological difficulties (cf. Sec. 4.2), and 
other boundary conditions would in fact be com- 
putationally easier. 

4-2. Fast gauge fixing and updating 

To avoid the disorder introduced by the gauge 
freedom, one can (although we will see that per- 
haps one does not have to) "fix the gauge" into 
a smooth field by a gauge transformation (4.5). 
To see the smoothness, the field A = A h should 
first be recognized as a pair of fields, A 1 '" and 
A 2 ' h , the first consisting of the values of A h on 
horizontal links ( j + \ , k) , the second on vertical 
links (j, k + |) . One main reason for fixing the 
gauge is to see that each fixed A^> h tends to a 
continuous field A^ as h —> 0, enabling a better 
understanding of the fields, the equations, and 
the solver (cf. Sec. 4.7). The 2ir periodicity of 
hA h is not meaningful at that limit, so we will 



fix the gauge as a real field, loosing this periodic- 
ity. This is impossible to do with periodic bound- 
ary conditions. (Indeed, the boundary values are 
periodic in U h , hence in hA(mod 2ir), not in A 
itself.) Therefore we will fix the gauge piecewise, 
in a rectangular subdomain S. 

The sumdomain S can be large, in fact as large 
as the entire domain, but without its periodic- 
ity Indeed we can even retain the periodicity 
in one direction, x\ say, and keep the full size 
of the domain in the other direction, too, but 
with the periodicity cut out, say along the grid- 
line k = k . The simplest description will be in 
terms of a double value for each Aj +1 , 2 ko , de- 
noted A) +1/2 ko _ and A) +1/2 kg+ , referring to the 
sides k < fco and k > fco respectively. An actual 
gauge fixing will of course use only the values of 
one side, A^ +1 ^ 2 ka _ for example, calling them 
actually A) +1/2kQl giving for each A 2 jM+1/2 a 
different value than in our double- value descrip- 
tion. Such a subdomain S will be called the cut- 
torus cylinder. 

The values of |/i 2 curlA| determined (stochas- 
tically) by the action (4.4) are not necessarily 
small even at large /?; they are only small modulo 
2tt. Our first step is to turn them actually small 
(not just modulo 2ir) throughout S, by adding 
to hA 1 , wherever needed, an integral multiple of 
2ir. Note that we can do that even in the cut- 
torus cylinder case, due to the permitted double 
values along k — k . In fact, this operation is 
where double values are being introduced. 

Note 1 . After this operation one can still add 
any arbitrary (but the same) integer multiple of 
2-7T to all hA^ +1 ^ 2 k with the same fixed j, and 
similarly to all hA 2 fe+1 / 2 with the same k. 

Thus we get 

curlA = g, max \g\ < nh~ 2 (4.6) 

prescribed by the action. The gauge freedom im- 
plies that the field 
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(divA)j- k = l{A j+1/2jk - Aj- 1/2 ,k 
+Aj t k+i/2 - A hk _i/ 2 ) 

can be fixed arbitrarily at all grid vertices (j, k) 
— at least on any non-periodic piece of the lat- 
tice. To obtain as smooth a field as possible, the 
Landau gauge 

dWA = (4.7) 

is natural. With the given near Gaussian fluctu- 
ations of h 2 curl A on each plaquettes, (4.7) im- 
plies that each of the fields U' htJ - = exp{ihA^) 
has correlation length £ G = O^^h); that is, 
\£,Ag — ^Ap, | <C 7T if the distance £ between links 
I and £' is small compared with £ G . 

Fast multigrid solver. To explicitly find the 
field B defining the gauge transformation (2.5) 
that would yield (4.7) is equivalent to solving 
a discrete 5-point Poisson equation for B. The 
FMG solver (Fig. 2 above), employing RB-GS 
relaxation and v\ = v 2 = 1 would solve the prob- 
lem in less than 30 additions per gridpoint. (The 
needed multiplications are by powers of 2, which 
can be performed as additions.) 

As discussed in Sec. 3.1, the FMG solver would 
give us solutions satisfying something like (3.14). 
In the present case this means that any better 
accuracy in solving (4.7) is not needed because 
it will not give smaller variations in A, on any 
scale. 

Cut-torus gauge fixing. A particularly useful 
gauge fixing can be obtained for the cut torus 
cylinder. On the line k — ko double values of the 
transformation field B are allowed: Bj. ka - affect- 
ing A) ±1/2 ko _ and A 2 j k _ 1/2 , and B jM+ affect- 
ing A} ±1 / 2i fc 0+ an d ^j,fe+i/2- We can require the 
gauge transformation to give us 

A] M+ - A] M _ = C*, for all j, (4.8) 

where the constant C* is of course determined 
by the current value of T,j( A j,k + ~ A j,k -)> 



which has itself been determined by the sum over 
all the plaquettes of the function g, defined at 
(4.6); it is easy to see that C*hL/2ir must be 
an integer. The requirement (4.8) will determine 
Bj t k + — Bj^k -, so only one of the two values re- 
mains at our disposal, and we denote it Bj,fc . We 
can now further require the gauge transformtion 
to yield (4.7) throughout the periodic domain, 
including the cutline. (In defining div A at the 
cut point (j, k ) we can use for Al ±1 , 2 ko either 
A 1 , , . , or A 1 ,, ,„ , ; due to (4.8) the result 

j±l/2,k + j±l/2,k — ' v ' 

is the same.) This will give us again a discrete 
Poisson equation for B, but with completely pe- 
riodic boundary conditions. The equation is solv- 
able since the sum over its right-hand sides van- 
ishes. The undetermined additive constant in B 
is immaterial for the transformed field A. The 
fast multigrid solver described above still applies; 
in fact, for these periodic boundary conditions it 
is particularly simple, since no special treatment 
is needed at the boundary. (For these periodic 
boundary conditions, and for convenient lattice 
sizes such as L = 2 l , one can also use fast FFT 
Poisson solver, which is only slightly less efficient 
than multigrid; but it would not have the super- 
fast updates described below.) 

The result, which will be called the cut-torus 
gauge field, has the field A 2 smooth everywhere 
(including at the cut), and the field A 1 smooth 
everywhere except for a constant jump (4.8) at 
the cut line k — k . This field is uniquely de- 
termined by (4.6), (4.7) and the choice of k , 
except for the possible addition of constants 
2miir(hL)~ 1 and 2to27t(/i£)~ 1 to the fields A 1 
and A 2 respectively, with arbitrary integers m\ 
and m 2 . This freedom results from Note 1 above. 

Shifting the cut from ko to ko is a trivial trans- 
formation. If for example 1 < ko < ko < L, the 
new gauge field A is, for all j, 
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4+i/2,k = A )+i/2,k +C* + 2mm(hL)- 1 , 
(k < k < fe ) 

^j+i/2,fco = A )+i/2,fc + + ZmMhL)- 1 
A)+i/2- kQ+ = ^ +1/2 ,fc + C * + 2m l7 r(/ l L)- 1 

^+l/2,5o- =^ + l A fco +2mi7r( ' ii) " 1 

A j+i/2,fe = ^)+i/2,fe + ZmMhL)- 1 , 

(k < k or fc > fc ), 
A 2 = A 2 + 2m 2 7r(^i)- 1 , 



Aj+i/2,k+i and ^j,fe+i/2, by the amounts +S, +S, 
—S and —5 respectively, where S is determined 
stochastically by the action, then no re-fixing of 
the field is needed at all: if (4.7) is satisfied be- 
fore such a change, it will continue to be satisfied 
after it. 

A similar (piecewise or cut-torus) gauge fix- 
ing applies in any dimension and for non-Abclian 
gauge fields. 



with an arbitrary choice of the integers mi and 
m 2 . Saying that the cut is shifted away from 
a certain neighborhood will generally mean a 
choice of fc , mi and m 2 so that A is not only 
smooth but also as small as possible in that 
neighborhood. 

Thus, the cut-torus gauge has the advantage 
that it immediately describes, upto this simple 
shift, a field which is smooth at any chosen neigh- 
borhood. We will see its use in Sec. 4.7 below. 

Super- fast updates. Suppose the gauge field 
has been fixed as described above, and then one 
of its values has been stochastically changed. To 
re-fix the gauge it is enough to apply local re- 
laxation near the changed value. Only once in 
several such changes on grid h one needs to go 
locally to the coarser grid 2ft: transfer local resid- 
uals to grid 2ft, as in (3.8), then relax locally on 
the grid-2ft equations and correct the solution on 
grid h as in (3.11). Once in several such transfers 
to grid 2ft, a similar transfer is made from grid 
2ft to grid 4ft; etc. In this way the cost of fixing 
the gauge is only 0(1) per update, and the fields 
A 1 and A 2 are kept as smooth as they can be, on 
all scales. 

The range of the local relaxation may be min- 
imized by applying the stochastic changes in 
a special distributive manner (cf. Sec. 4.8). In 
particular, if instead of changing an individual 
phase at a time one changes simultaneously the 
four phases of a plaquette Aj +1 / 2 ,fc, Aj +1 _ k+1 / 2 , 



4-3. Vacuum gauge: £ G = oo 

Consider first the case A = and m q = 0. 
Eq. (4.2) is then identical to (3.22). As discussed 
there, it would be decomposed into 4 Cauchy- 
Riemann subsystems decoupled from each other, 
except that, due to the staggering in (4.2), only 
two such subsystems are present. Denoting by 
* 1 , * 2 , * 3 and * 4 the function * at gridpoints 
(j,k) with (j odd, k even), (j even, k odd), (j 
odd, k odd) and (j even, k even) respectively, one 
subsystem couples 'I' 1 with 'J 2 and the other cou- 
ples "J 3 with A multigrid solver can thus be 
built along the lines explain in Sec. 3.4, yielding 
the standard multigrid efficiency. 

Note in particular that corresponding to the 
four species of there are four kinds of equa- 
tions, each one centered at the gridpoints of an- 
other species. Similar four species and four kinds 
of equations arc defined on the coarse grid. Cor- 
rections should be interpolated to each species 
from the corresponding species on the coarse 
grid. Similarly, residuals of one kind of fine-grid 
equations should be transferred to the same kind 
on the coarse grid. 

A comment on the doubling effect. Even with 
an arbitrary gauge field, for m q = the above 
two subsystems remain decoupled. For m q ^ 0, 
since supposedly h <§C £ m , the two subsystems 
are still only weakly coupled locally. Such "dou- 
bled states" do not correspond to a physical re- 
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ality. They can be removed by the non-staggered 
Wilson discretization, which breaks chiral sym- 
metry. Another, perhaps simpler way to remove 
them (also breaking chiral symmetry, but keep- 
ing second-order accuracy) is to take only one of 
the staggered subsystems (e.g., "J/ 1 and \I/ 2 ) and 
replace the term m q ^>j t k in (4.2) by 

777 

^(^ fe+ i/2*^+i + E^fc-i/a^fc-i). (4.9) 

The doubling effect thus removed, this discretiza- 
tion is more convenient for developing the fast 
multigrid solver, and cost only half the solution 
time. We emphasize however that, as in Sec. 3.4, 
the original system can be solved just as fast (per 
unknown), even though less conveniently. 

The case A = and m q ^ is solved ba- 
sically by the same algorithm. On grids with 
meshsize h <C £ m , the principal local operator 
is as before, so the same relaxation would have 
nearly the same smoothing rate. On grids with 
h > 0(£ m ) the Kaczmarz relaxation will have 
no slowing down, so no grid coarser than that 
need be employed. Standard multigrid efficiency 
is still easily obtained. 

Applying the gauge transformation (4.5) to 
the A = case does not change the problem; it 
only expresses it in new variables, explicitly re- 
lated to the old ones. Therefore, the solver need 
not change either, it should only be expressed 
in terms of the new variables. This immediately 
yields the following algorithm, for any vacuum 
(transformable to A = 0) gauge field: 

1. The overall flow of the algorithm is still the 
same (e.g., Fig. 2; the coarsest grid, as explained 
above, should have 0(£ m ) meshsize, unless £ TO > 
Lh). 

2. Relaxation is still Kaczmarz (see end of Sec. 
3.4). 

3. The intergrid transfers 1^ and I^ h should 
use the gauge field as parallel transporter. This 
means that if a quantity is transferred from site 



x to site y (e.g., a residual calculated on the 
fine grid is transferred to a coarse gridpoint y, or 
a correction U 2h calculated on the coarse grid 
is transferred to a fine gridpoint y as part of 
an interpolation), the quantity should be mul- 
tiplied by U1U2 ■ ■ ■ U m , where U\, Ui, . . . , U m are 
the gauge field values along a sequence of lat- 
tice links leading from x to y, taking of coarse 
Ug instead of Ue when the link I leads in the 
negative direction. It is easy to check that, since 
curl A = 0, the transporter U1U2 ■ ■ ■ U m does not 
depends on the chosen route from x to y. 

4. The coarse grid operator L 2h is the exact 
coarse-grid analog of (4.2), with the coarse-grid 
gauge field A 2h obtained from the fine grid A h 
by injection. Namely, if the coarse-grid link L 
is the union of the fine-grid links £ and £' , then 
Af = (A'^ + A^)/2, hence U 2 L h = UfrUj? (noting 
the dependence on h introduced in (4.3)). 

Step by step, this algorithm will record results 
(e.g., magnitude of residuals) that are indepen- 
dent of the gauge transformation. Thus it will 
still exhibit the textbook multigrid efficiency. 

4-4- Scales h £ G or £ G >min(£ m , hL) 

The case discussed above is that of h 2 curl A = 
0(mod 27r), produced for example by (3 — > 00, 
giving also ^ G ^ 00. The same algorithm can 
still be employed, and usually at the same effi- 
ciency, on all grids with meshsize h <C £ G . On 
such grids the parallel transporter is still locally 
well defined (nearly route independent), hence 
the algorithm will perform locally close to its 
performance for A = 0. 

By saying that a multigrid algorithm works 
efficiently for a certain meshsize h we mean the 
qualification "provided the equation for level 2h 
are solved efficiently; whether or not this provi- 
sion holds is a separate discussion" . 

Still, even on levels with h <C £ G , a certain 
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trouble may arise; in fact sometimes it does, 
sometimes it does not: the system of equations 
may become nearly singular (cf. Sec. 3.9). This 
for example happens when m q = and the total 
topological charge Q ^ 0, where we define 

Q = — £[(/i 2 curlA i+1/2)fe+1/2 )(mod 2tt)]. 

Here the sum is over all plaquettes (J + 1/2, k + 
1/2), and as usual X(mod 2tt) = X+2mw, where 
m is an integer such that — tt < x + 2rmr < %. 
Usually (in periodic boundary conditions or vac- 
uum far field) Q is an integer. According to 
a special case of the Atiyah-Singer index theo- 
rem (see, e.g., f58|]) the continuum analog of Eq. 
(4.2) has Q eigenmodes with zero eigenvalues. 
Hence, and because of the doubling effect dis- 
cussed above, Eq. (4.2) will have 2Q almost-zero 
modes (AZMs). In this situation, as explain in 
Sec. 3.9, these AZMs will not converge in the 
usual multigrid algorithm, and they have to be 
expelled by recombining 2Q + 1 itcrants (or only 
Q + 1 iterants if one is careful to separately re- 
combine each subsystem). 

If m q > the eigenvalues are shifted away 
from 0, and then recombinations need not be 
done on fine grids, only on coarse ones where 
(3.26) still fails, hence their extra computational 
cost will usually be small. (If the coarse grid on 
which recombination is needed is to be visited 
many times, it may be more efficient to combine 
iterants so as to explicitly isolate and store the 
AZMs, and then use the latter directly at each 
new visit to the grid.) 

In case a large topological charge is concen- 
trated at few special plaquettes, it may be "in- 
visible" to the next coarser grid. This is a case 
of a small-scale essential singularity, which may 
be another reason for recombining iterations (see 
Sec. 3.8), even if the total Q vanishes. 

On grids where h approaches £ G , the algorithm 
of Sec. 4.3 should radically be changed (see Sec. 



4.5). However, if £ G is not too small compared 
with min(£ m , hL), then one can simply afford 
solving on such grids by a slower, usual itera- 
tive method, such as Kaczmarz relaxation with 
conjugate gradient acceleration. In particular, if 
such a grid with such a slower solver is taken as 
the coarsest level for a multigrid cycle that has 
several finer levels, then the cycle efficiency will 
not be substantially disturbed by it. 

Indeed, experiment showed that for m q and (3 
in the physically interesting ranges the standard 
multigrid efficiency is obtained by this algorithm 
(enforced on some levels as in Sec. 4.5 below) 
even without recombinations 0, EjJ, More 
recent experiments showed that smaller m q can 
be accommodated as efficiently by recombining 
iterants. 

Incidentally, the experiments showed that a 
special care should be taken in the statistical 
process that generates the gauge field. A slow 
Monte-Carlo process with a cold (A = 0) start 
may never yield Q / 0, and with a hot start (ran- 
dom A) may get stuck with too large Q and too 
short £ G . This topic belongs of course to Sec. 5 
below. 

4-5. Scales h ~ £ G <§; min(/iL, £ m ) 

The multigrid algorithm described above starts 
to have troubles when it is employed for grids 
whose meshsize h approaches £ G . Similar to the 
case of disordered diffusion problems (Sec. 3.5) 
the main difficulty has to do with the representa- 
tion A 2h of the gauge field on the coarser grids. 

Indeed, the values of h for which the algorithm 
still works has been pushed a significant factor 
up by replacing the naive injection (see Sec. 4.3) 
with a parallel transport of the gauge field itself, 
carefully separating different "kinds" of links. 
One "kind" of links, for example, connects -0 1 
points to neighboring ip 3 points; another con- 
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nects ip 3 to ip 2 ; etc. A link on grid 2h should 
be composed from a pair of grid-h links of the 
same kind. Since the location of such a pair docs 
not necessarily coincides with that of the grid-2/i 
link, the pair should sometimes be parallel trans- 
ported from another location. Such constructions 
are simpler to formulate if the coarsening is done 
one dimension at a time. See and || for de- 
tails. 

4-6. Disorder at £ G < h <C £ m 

As the meshsize becomes larger than £ G , the 
system of equations is stuck in "disorder" (which 
could be avoided, though: see Sec. 4.7). Simi- 
lar to the situation described in §3.5, the large- 
scale connections seem to no longer follow a well- 
ordered pattern similar to that on the fine grid. 
It is thus natural to seek an AMG-like approach 
in formulating the grid-2/i equations. (Such an 
attempt has been initiated by R. Ben-Av.) 

In the AMG approach L 2h (written out as a 
real system) is constructed by the Galerkin form 
(3.9), where if' 1 , in cases of self-adjoint systems, 
is chosen by (3.10). This reduces the problem of 
coarsening to that of constructing only a good 
interpolation scheme. 

However, our system (4.2) is not self-adjoint. 
It is tempting to turn it into self-adjoint by re- 
placing (4.2) with 

L h ^L h ^ = L'^f. 

This, however, would raise the order of deriva- 
tives in the system from 1 to 2, hence would re- 
quire, by (3.17), the construction of a higher or- 
der interpolation, which is considerably more dif- 
ficult. It may be better to stay with the original 
system and construct iff 1 = 2^ d (P^ h ) T , where 
I% h does not necessarily coincide with I^: in the 
same way that I^ h is constructed as a "good in- 
terpolation" (see below) for L h , I 2 h should be 



constructed as a good interpolation for the ad- 
joint of L h . 

A good interpolation in the AMG approach 
means not just good interpolation weights, but 
also a good choice of the coarse-grid variables. 
For both purposes one has to distil local rela- 
tions which must be satisfied (to accuracy orders 
spelled out in (3.17)) by all error functions that 
converge slowly under the employed relaxation 
scheme. 

One general approach for finding such lo- 
cal relations has been called pre-relaxation (see 
Sec. 6.1 of |p2T| ). Applying several sweeps of 
the given relaxation scheme to the homogeneous 
(zero right-hand side) equations will result in a 
typical shape of a slow-to-converge error. (For 
the homogeneous equations the solution van- 
ishes, hence the current approximation equals 
the current error.) Repeating this for several ran- 
dom initial approximations yields several such 
typical error functions, independent on each other 
even locally. At each point, a suitable local re- 
lation is any relation approximately satisfied at 
that point by all these error functions. It can be 
distilled from them by usual data fitting (least 
square) techniques. 

In this way L 2h is constructed from L h . Simi- 
larly L 4h can then be constructed from L 2h , and 
so on. 

The same approach could also be used at finer 
levels, where h « but it is much more ex- 
pensive and less efficient there than the parallel- 
transport algorithm described above. 

4-7. Introducing order 

The algebraic multigrid (AMG) approach is 
very expensive not only in deriving the coarse 
level operators (and re-deriving them upon each 
change in the gauge field), but also in operat- 
ing them, especially since they loose much of the 
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sparsity associated with distinguishing between 
different species. Avoiding the disorder that mo- 
tivates AMG by observing an underlying apriori 
order can make the algorithm simpler and much 
more efficient. Possible approaches are outlined 
below. 

First we observe that gauge fixing, and es- 
pecially re-fixing, has negligible cost compared 
with the Dirac solver itself (cf. Sec. 4.2). With 
the gauge field satisfying (4.7), and hence smooth 
for ft <C £ G , and with species labelled as in Sec. 
4.3, Eq. (4.2) in the limit h -> gives 



Di^ 1 + D 2 ^ 2 + m ? $ 4 = f 4 (4.10a) 

-M 1 + M 2 + mgl- 3 = / 3 (4.10b) 
and two similar equations with f 1 and / 2 , where 

D^d^-iA^. (4.10c) 



Since h <C £ m at all levels for which we need to 
construct a multigrid solver (see Sec. 4.4), we can 
assume in describing any underlying local order 
that m q — 0. Our system breaks then down into 
two subsystems decoupled from each other (see 
Sec. 4.3), one of them being (4.10). Defining 

*+ = * 1 -i* 2 , *~ = * 2 - i* 1 , (4.11) 

it is easy to see that for m q = Eq. (4.10) yields 

(Di±i£) 2 )* ± = / ± , (4.12) 

where / + = f 4 — if 3 and /~ = / 3 — if 4 . Define 
further S + and E~ to be solutions of 

{d 1 ±id 2 )E ± =i(A 1 ±iA 2 ). (4.13) 

Finally, defining $ + and <I>~ by 

* ± =e H± $ ± (4.14) 

and substituting into (4.12), we obtain, by (4.13), 



(0i ±id 2 )& ± = e-' s± f ± . (4.15) 

These relations show the underlying regular- 
ity in Eq. (4.2), because both (4.13) and (4.15) 
are regular elliptic systems, each in fact equiv- 
alent (when written as a real system) to the 
Cauchy-Riemann equations (3.19). Multiplying 
Eq. (4.13) by (<9i =F i(h), it can also be written 
as the Poisson equations 

AS ± = ±curM + idivA (4.16) 

This suggests that for the underlying functions 
to be well approximated on any grid 2ft, the 
gauge field A 2h should be generated from A h by 
requiring curlA 2/l and divA 2h to be local av- 
erages of curl A' 1 and divA h , respectively, and 
solving for A 2h via the algorithm of Sec. 4.2. 
(In particular, if divA h = on the finest grid 
by gauge fixing, it will remain so on all coarser 
grids, and a cut-torus gauge A h will give cut- 
torus gauge fields on all coarser grids, with the 
same cut and the same jump C*.) More impor- 
tant, when h > 0(£ G ), values of the topological 
charge | ft 2 curl A obtained from finer levels by 
such averaging may exceed n. Such values can- 
not actually affect because by (4.3) hA is 
only defined modulo 2ir. This may well explain 
the difficulty encountered at such scales. 

This also suggests two possible approaches 
around the difficulty. One is to treat large con- 
centrated topological charges that cannot be rep- 
resented on coarser grids by general methods de- 
veloped for small-scale essential singularities (see 
Sec. 3.8). Namely, smear the topological charge 
on a wider area at the coarser levels and/or re- 
combine iterants. 

Another, more radical approach, which can 
also treat difficulties arising from the imaginary 
part of (4.16), is to abandon (4.2), at least on 
coarser levels, and, with the gauge field fixed to 
satisfy (4.7), discretize (4.10) directly. In particu- 
lar, the covariant derivatives will be obtained 
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by direct central differencing of (4.10c), circum- 
venting the limitation on the size of \h 2 curl A |. 

This approach represents the fixed field as 
a field that has a continuous continuum limit, 
which is generally true only piecewise. It gen- 
erally contradicts, for example, periodic bound- 
ary conditions for the field U h — e lhA . This dif- 
ficulty does not seem to be substantial; first, 
because periodic boundary conditions do not 
represent any physical reality. Also, for peri- 
odic boundary conditions the above approach 
can be used patchwi.se, employing the cut-torus 
gauge. Namely, each multigrid process (relax- 
ation, inter-grid transfer) at any neighborhood 
can be done with the cut line shifted away from 
that neighborhood. 

In higher dimensions and for non-Abelian gauge 
fields, the "gauge phases per unit length" with 
the cut-torus fixing still have piecewise continu- 
ous limit, so the described approach seems still 
applicable. 

4-8. Superfast updates. Localization by 
distributive changes 

In addition to the fast solver, the multigrid 
structure can yield very fast procedures for up- 
dating the solution upon any local change in the 
data. 

For simplicity we will assume in the discus- 
sion here that m q = 0; for larger m q the range of 
influence of changes will be shorter, hence the as- 
sertions below will hold even more strongly. Also 
for simplicity we will consider the system (4.10), 
with the gauge field satisfying (4.7); gauge trans- 
formation will not change our conclusions either. 

Consider first the case of a change introduce 
to /q , the value of the forcing term / 4 at the 
origin. In vacuum (A = 0), this will change the 
solution i&j k, at distance r — h(j 2 + k 2 ) 1 / 2 from 
the origin, by an amount which is 0(r -1 ). More- 



over, the £-th derivative (or difference quotient) 
of the change will decay like 0(r _1_£ ). Thus the 
change is not necessarily very small, but becomes 
very smooth at a short distance (few meshsizes in 
fact) from the origin. This implies that in apply- 
ing the FMG algorithm (cf. Fig. 2) to the change 
in the solution, on every grid one has to employ 
relaxation only in the neighborhood of the origin 
(upto few meshsizes away). 

Furthermore, one can cut the work far more 
by introducing the changes, say in / 3 , in a dis- 
tributive manner. This means that changes are 
distributed to several values of / 3 at a time, 
according to a prescribed pattern. For exam- 
ple, changing simultaneously fj k and ffk-2 by 
+5 and —5 respectively is a first-order distribu- 
tive change. Changing (/ 3 fe _ 2 , / 3 fe , ./| fe+2 ) by 
(+S, —25, +5) is second-order, and so is changing 

{f!-2,k-2Jf-2,kJh-2Jlk) b y (+<*. -<*. -<*. + s )- 

Etc. The effect of an m-th order distributive 
change on the solution will decay as 0(r _1_m ). 
Hence with an appropriate choice of distribu- 
tion order (m = 1 may indeed suffice), the ef- 
fect of the solution becomes practically local, and 
can be obtained by few relaxation steps in some 
neighborhood of the change. Only once in several 
such changes on grid h one need to go locally to 
the coarser grid (cf. Sec. 4.2); and once in several 
such transfers to grid 2h, a similar transfer (local 
on scale 2h) is made from grid 2h to grid Ah; and 
so on. The work per change is just 0(1). 

The same must be true in a non- vacuous gauge 
field, as can be seen from (4.15), except that on 
grids with meshsize h>t; G the functions have 
random second "derivatives" , so the effect of sec- 
ond or higher order distributive changes is more 
complicated and requires a probabilistic investi- 
gation. 

Distributive changes do not span all the changes 
of interest, but all is needed to complement them 
are smooth (on scale h) changes. The latter can 
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be processed on the grid 2h. Moreover, they too 
can be distributive (on scale 2h), complemented 
by smooth (on scale 2h) changes; etc. 

Changes in the gauge field itself can be treated 
similarly: their effect, as can be seen from (4.16), 
can also be localized by distribution. If the 
changes are to be governed stochastically, one 
submits distributive changes to the Monte-Carlo 
process. This should be complemented by a cor- 
rective Monte-Carlo process on grid 2h (as one 
needs to do anyway to avoid slowing down of 
the simulation: see Sec. 5). The latter can be 
again distributive, complemented by a grid-4/i 
process; and so on. On each scale such distribu- 
tive changes have only local effects, easily estab- 
lished by local relaxation. 

4-9. Inverse matrix and determinant 

The multigrid structure can also provide ef- 
ficient ways for storing, updating and using in- 
formation related to the inverse matrix M _1 = 
(L h )~ 1 . For a large lattice with n sites, the stor- 
age of the inverse matrix would require 0(n 2 ) 
memory and 0(n 2 ) calculations, even with a 
fully efficient multigrid solver. Both can be re- 
duced to 0((£ + e~ 1//£ ) d n), where e is the rela- 
tive error allowed in the calculations and I is the 
interpolation order below, by using the following 
multilevel structure. 

Denoting the propagator from gridpoint x — 
(jh, kh) to gridpoint y = (j'h, k'h) by 

M-\ X ,y) = {{L^) ( . kU ., Miy 

the €-th "derivatives" (difference quotients) of 
this propagator, with respect to either x or y, 
decay as 0(\x — y|~ 1_£ ). Therefore, an border 
interpolation of the propagator from grid 2h to 
grid h will have at most 0{h l (\x — y\ — lh/2)~ e ) 
relative error, which will be smaller than e in the 
region 



\x-y\/h>K = Ce-V'+ifa 

where C is a (small) constant. Hence, propaga- 
tors M~ 1 (x,y) with \x — y\ > Kh need be stored 
on grid 2h only, except that, for a similar reason, 
those of them with \x — y\ > 2Kh need actually 
be stored only on grid Ah; and so on. 

This structure can be immediately updated, 
upon changes in the gauge field, especially if 
those are made in the above distributive man- 
ner (cf. Sec. 4.8). Changes is propagators de- 
scribed on grid 2h (associated with relaxing the 
smooth changes in the gauge field) affect those 
described on grid h through a FAS-like interpo- 
lation (cf. Sec. 3.3: it means correcting u h by 
^2h( u2h ~ Ifi uh )> exce Pt that here one interpo- 
lates both in x and in y). The cost per update is 
0(1), i.e., independent of lattice size. 

With M _1 thus monitored, one can inexpen- 
sively calculate changes in logdet M. For a small 
change SM in the gauge field 

(51ogdctM = ti(M~ 1 8M) 1 (4.17) 

which can be computed locally, based on M~ l (x, y) 
with neighboring x and y. For larger changes 
one can locally integrate (4.17), since the local 
processing also gives the dependence of M _1 on 
5M. Again, the amount of calculations per up- 
date does not depend on the lattice size. 

5. Multiscale Statistical Simulations 

The problem of minimizing a particle energy 
E(r), or an energy E a (u a ) of a function u a de- 
fined on a d-dimcnsional lattice with meshsize 
a, has been used in Sec. 1 to introduce several 
multiscale processes. Similar processes can also 
be very useful in accelerating statistical simula- 
tions governed by such an energy (or Hamilto- 
nian) E a . The first objective of such simulations 
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is to produce a random sequence of effectively in- 
dependent configurations u a , in equilibrium, i.e., 
in such a way that the probability of any u a to 
appear anywhere in the sequence is given by the 
Boltzmann distribution 



(5.1) 



P(u a ) = I e -£ Q (« Q )/r 
Zj 

where T is the temperature and Z is a normal- 
ization factor such that J P(u a )Du a = 1. 

The minimization problem is in fact easily seen 
to be the limit T — > of (5.1). It has been 
shown in Sec. 1 that multiscale processes can 
solve this limit problem in 0[n) computer op- 
erations, where n — L d is the number of lattice 
sites. In a similar way it will be shown here that 
the multiscale-accelerated simulation needs only 
0(n) operations to produce each new, effectively 
independent configuration. 

Such accelerations were first introduce, inde- 
pendently, in Q and in Sec. 7.1 of [pBJ. (A sig- 
nificant difference is that in |36| the "constant 
interpolation" is used. The implications of this 
will be examined below.) Another type of accel- 
eration was introduced for Ising spin models in 
0, and then extended by embedding Q to 
many other models (see review in (fzTJ). The re- 
lation between, and combination of, these two 
types of acceleration will be outlined below. 

The central claim, however, of this chapter 
(following fl^| ) will be that accelerating the pro- 
duction of effectively independent configurations 
is not exactly the main issue. The real objec- 
tive of the statistical simulations is to calculate 
some average properties of the configurations it Q , 
and the main issue is how fast deviations from 
any desired average can be averaged out. The 
multigrid structure will be shown very useful in 
cheaply providing much statistical sampling in 
its coarse levels, thus promoting fast averaging 
of large scale fluctuations, which are exactly the 
kind of fluctuations not effectively self-averaged 



in any one produced configuration. 

In usual Monte-Carlo processes, one useful 
measurement cost 0(L d+z ) operations, where 
typically z » 2. The acceleration techniques ide- 
ally remove the Critical Slowing Down (CSD), 
i.e., the factor L z . For the ideal Gaussian case it 



has been shown 18 that using coarse-level sam- 
pling can eliminate the volume factor L d as well. 
The development of such techniques to other 
models, including spin models, is discussed be- 
low. 

5.1. Multiscale Monte-Carlo: unigrid 

Similar to the point-by-point relaxation in 
Sect. 1.5, the usual way to simulate (5.1) is the 
point-by-point Monte-Carlo process. The basic 
step is to simulate one variable uf; holding all 
other uf fixed, (5.1) describes a probability dis- 
tribution for uf , which can easily be simulated, 
e.g., by assigning to u a a random value with 
that distribution. A point-by-point Monte- Carlo 
sweep is the repetition of the basic step at all 
sites (i = 1,2, ... ,n). A long enough sequence 
of such sweeps will produce a new (effectively 
independent) configuration, the probability dis- 
tribution of which is indeed (5.1). 

And similar to the point-by-point relaxation, 
the main trouble of this Monte-Carlo process is 
its slowness: typically, for a lattice with n = L d 
sites, a sequence of 0{L 2 ) sweeps is needed to 
produce a new configuration. The same smooth 
components which are slow to converge in any 
local relaxation, are also slow to change in any 
local Monte-Carlo process, and for similar rea- 
sons. So steps of more collective nature are re- 
quired here as well. 

A Monte- Carlo step on scale h is a collective 
move of the form (1.12), whose amplitude Uk is 
decided stochastically, under the probability dis- 
tribution deduced for it from (5.1). A Monte- 
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Carlo sweep on scale h is the repetition of such 
a step at all points Xk of a grid with meshsize 
h (h > a), laid over the original grid a. A (un- 
igrid) multiscale Monte-Carlo cycle is a process 
that typically includes a couple of Monte-Carlo 
sweeps on each of the scales a, 2a, . . . , 2 e a, where 
a sweep on scale a is just the usual point-by- 
point Monte-Carlo sweep, and 2 l a is a meshsize 
comparable to the linear size of the entire lat- 
tice. Under ideal situations, each such cycle will 
produce a nearly independent configuration. 

Such cycles were introduced in fl53fl , Q un- 
der the name "multigrid" . We will use for them 
the adjective "unigrid" to emphasize that all the 
moves are still performed in terms of the finest 
grid (the given lattice), thus distinguishing this 
process from the more developed multigrid; cf. 
the comment at the end of Sec. 1.3. 

Compared with the more developed multigrid 
that will be described later, the unigrid cycle 
has two basic disadvantages. First, the moves 
on coarse scales are very expensive: each move 
(1.12) on scale h involves changing 0((h/a) d ) 
values in the basic grid. This disadvantage is not 
so severe for cycles that employ the same number 
of sweeps at all scales, since the number of moves 
in each sweep on scale h is just an 0((/i/a)~ d ) 
fraction of their number in each sweep on the 
basic grid. But we will see below that for statis- 
tical purposes one would better do many more 
sweeps on coarse scales than on fine ones, so this 
disadvantage will become crucial. 

A second, not less serious disadvantage is that 
it is often impossible to prescribe in advance the 
shape functions w£(C) that control the large- 
scale moves (1.12) so that high enough proba- 
bilities will result for producing reasonably large 
amplitudes u£. Suitable shape functions can be 
prescribed apriori only if the shapes of the prob- 
able large scale moves are indeed sufficiently in- 
dependent of the current configuration (see also 
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Sec. 5.6). 

Even when such apriori probable shape func- 
tions exist, their calculation is often best ob- 
tained in the multigrid manner, in which each 
level of shape functions w\ is derived from the 
next-finer- level shapes w^ 2 . We have seen the 
importance of such derivation even for determin- 
istic problems, e.g., in Sees. 3.5 and 4.6 above, 
where the interpolation l\\ between neighboring 
scales is non-trivial and need be separately de- 
rived at each level. For stochastic problems the 
need for such a hierarchical construction of large- 
scale moves is much stronger, because of the sta- 
tistical dependence between moves at different 
scales, especially neighboring scales. 

On the other hand, the unigrid approach has 
the important advantage that it does not re- 
quire derivations of the coarse level Hamiltoni- 
ans, which can be quite problematic: see Sec. 5.5. 

5.2. Multigrid Monte- Carlo 

Thus, instead of performing the moves directly 
in terms of the finest grid, the multigrid ap- 
proach, similar to Sec. 1.6 above, is to consider 
the moves u 2h on any grid 2h as a field which 
jointly describes displacements for the next finer 
field, u h , by the relation 

6u h = I h 2h u 2h , (5.2) 

where I 2h is, as before, an operator of interpola- 
tion from grid 2h to grid h. Assume for now that 
the grid-2/i Hamiltonian, at any given fine-grid 
configuration u h , 

E 2h (u 2h ) = E h (u h + l' 2 \u 2h ) (5.3) 

has been derived as an explicit function of u 2h 
(with coefficients possibly depending on u h ). 
Then the multigrid cycles described above (Fig. 1) 
can be employed here; the only difference be- 
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ing that "relaxation sweeps" are replaced by 
"Monte-Carlo sweeps" . 

Under ideal situations, each cycle, with v\ + 
i>2 = 2 or 3, would produce a new, effectively 
independent configuration. More precisely we 
mean by this that the correlation between any 
quantity of interest in the initial configuration 
and in the one produced after k cycles decays like 
e~ fc / T , where r, the cycle auto-correlation time, 
is independent of the lattice size L; indeed r is 
often smaller than 1 . Observe that as long as the 
cycle index 7 is less than 2 d , most of the work in 
each cycle is just the V1 + V2 Monte-Carlo sweeps 
on the finest lattice. So the total work for pro- 
ducing a new configuration is just 0(L d ), or just 
proportional to the number of sites in the lat- 
tice. By comparison, in the unigrid approach this 
work is 0((logi)L d ) for 7 = 1 and 0(L d+lo ^"<) 
for 7 > 1. 

5.3. Gaussian model: Eliminating CSD 

The prime example of the "ideal situation" is 
the Gaussian model, with the Hamiltonian 

i^(« a ) = 5>? fc («?-u2) 2 , (5.4) 

0\fc> 

where the summation is over pairs of neighbor- 
ing sites j and k, and a" fc are non-negative cou- 
pling coefficients. This Hamiltonian could arise 
as a discretization of (3.2). Since the interpola- 
tion /j/i i s a linear operator, coarse grid Hamil- 
tonians (5.3) can easily be derived and will have 
again the form (5.4), except that the range of 
neighbors k (such that a 2h k 7^ 0) for each site 
j will depend on the interpolation order. In the 
particular Gaussian case the coefficients a 2h k of 
the level-2/i Hamiltonian E 2h will not depend on 
the current fine-grid configuration u h ; they will 
only depend on the next-finer grid coefficients 
oh k , and on the coefficients of the interpolation 



operator I^ h . Thus, for a given E a , the coars- 
ening process depends only on the choice of the 
interpolation operators, and so is also the effi- 
ciency of the entire multigrid cycle. 

To obtain the ideal efficiency in this case, the 
coarser levels should accurately sample all the 
components slow to change under the current- 
level Monte-Carlo process. This implies that ev- 
ery slowly changing configuration v h must have 
an approximate u coarse- grid representation" , i.e., 
an approximate configuration of the form /•^u 2 ' 1 ', 
and the two configurations should have approxi- 
mately the same energy. 

In case of smooth and isotropic coefficients 
(e.g., aj k depending only on the distance from j 
to k), the slow-to-converge components are sim- 
ply the smooth components, which can indeed 
be approximated by interpolants from a coarser 
grid. The requirement of approximating the en- 
ergy implies in this case that the order of in- 
terpolation should be at least 2, i.e., linear or 
multi-linear interpolation, such as given by (1.7). 
(The required order is in fact a special case of 
the rule (3.17) above.) Indeed, with such an in- 
terpolation, a multigrid V cycle (see Fig. 1) is so 
efficient that it is hard to measure any correla- 
tion between the susceptibilities before and after 
the cycle. (The exact value of the very small au- 
tocorrelation time r depends on the details of 
the simulation at the coarsest level, and is not 
important anyway.) 

A border case is the first order constant inter- 
polation Jg/,) defined by the shape function 

■■■,&) 

[l if all 9i - 1< Ci < 0i (5-5) 
I otherwise 

with any convenient 0\, . . . , 64. For any smooth 
function v h approximated by a coarse function 
u 2h , the energy of / 2 \ji 2 ' 1 is about twice that of 
v h . Hence, a component v a on the finest grid 
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which is so smooth that it effectively changed 
only on a coarse grid with meshsize 2 m a, say, 
will be represented for that coarse grid by an 
approximation which has an energy 2™ times 
its own energy. Therefore, that coarse grid will 
change such an approximation a change with size 
only 0(2~ m / 2 ) the probable size for changes of 
that smooth component. Hence, roughly 2 m vis- 
its to that coarse grid will be needed to (ran- 
domly) accumulate the actual probable size of 
a change. This means that any grid 2h = 2 m a 
should be visited at least twice per each visit to 
grid h = 2 rn ~ 1 a, i.e., a cycle index 7 > 2 must 
be used. 

Experiments indeed show that a W cycle 
(7 = 2) is enough to produce r = O(l), while 
a V cycle (7 = 1) is not. For dimension d > 2 
this algorithm with constant interpolation still 
eliminates CSD, since the work per cycle is still 
0(L d ) for any 7 < 2 d . 

When the "diffusion coefficients" a" k are anisotropic 
or wildly changing in size, the interpolation op- 
erators that produce good approximations to 
slowly changing components get more compli- 
cated. In cases of consistent anisotropy, semi- 
coarsening (e.g., decimation only in the direc- 
tion of strong couplings) should be used (cf. Sec. 
4.2.1 in If the coefficients change wildly, 

shape functions and coarsening strategies simi- 
lar to those in Sec. 3.5 above need be employed. 

It need perhaps be emphasized that the multi- 
grid process eliminates CSD from Gaussian mod- 
els with variable coefficients, which cannot gen- 
erally be done by Fourier methods. Also, un- 
like Fourier, general non-periodic domains can 
be handled with the same efficiency. 

Non-Gaussian models will be discussed in Sec. 
5.5. First, however, we will argue in the next sec- 
tion that elimination of CSD is not the only, per- 
haps not even the most important, issue. 



5.4- Eliminating the volume factor 

In usual statistical simulations on a d-dimensional 
grid of size L d , the amount of computer opera- 
tions needed to produce one statistically inde- 
pendent measurement is 0(L d ^ z ), where £ is the 
correlation length, which is normally 0(L). The 
exponent z is called the dynamic critical expo- 
nent. Typically z w 2 for point-by-point Monte- 
Carlo methods. Eliminating the critical slowing 
down, i.e., the factor £*, has been obtained by 
multigrid, as discussed above, and also by other 
methods. The important advantage of the multi- 
grid approach is that it can potentially drasti- 
cally reduce the volume factor L d as well. For 
Gaussian models it has been demonstrated [jl8| 
that this factor, too, can be completely elimi- 
nated. This is especially good news for high di- 
mensional (e.g., d — 4) problems. 

Statistical fluctuations in physical systems oc- 
cur on different scales: there are local fluctua- 
tions, intermediate-scale fluctuations, large-scale 
ones. Generally they are not independent of 
each other; especially, fluctuations at neighbor- 
ing scales can be highly correlated. But in many 
cases there is only weak correlations between de- 
viations at two widely different scales. 

At the coarse levels h > 2 m a of a multi- 
grid cycle, the finer-scale fluctuations are effec- 
tively frozen at their values in the current con- 
figurations u a , a 2a , . . . , it 2 '™". The coarser-scale 
fluctuations, if dependent only weakly on those 
frozen, can be averaged out on the coarse levels 
alone, before any return to the finer levels, by 
letting the coarse level Monte-Carlo simulation 
be suitably long and accompanied by a suitable 
sequence of measurements. Such averaging out of 
large scale fluctuations can be very efficient be- 
cause on these coarse grids such fluctuations are 
sampled rapidly (large changes per sweep) and 
cheaply (little work per sweep). 
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This of course depends on having a good 
enough representation of smooth components on 
the coarse grids. This for example is not the 
case if, in the Gaussian or other asymptoti- 
cally free models, constant interpolation is used 
in the inter-grid transfers. As explained in Sec. 
5.3, such an interpolation represent smooth com- 
ponents by far more energetic relatives, which 
therefore can move only a small fraction of the 
movements typical to the smooth components, 
not enough to allow averaging out of fluctua- 
tions. Linear interpolation, on the other hand, 
does represent the large-scale fluctuations on the 
correspondingly coarse grids by components of 
nearly the same energy, hence allow their full av- 
eraging by the coarse grid Monte-Carlo. 

The fine-scale fluctuations are largely self- 
averaged in each given fine-grid configuration. 
That is, local fluctuations at different parts of 
the grids are nearly independent, hence provide 
nearly independent samples. These samples are 
rapidly and cheaply changed by the Monte-Carlo 
process on that grid. 

Thus, generally, it takes only O(l) work to re- 
place any sample of a fluctuation on any given 
scale by the Monte-Carlo sweeps at the corre- 
sponding meshsize. Hence, if measurements ac- 
company the simulation closely enough, fluctua- 
tions on any scale can be averaged out very effi- 
ciently. 

The relative number of Monte-Carlo sweeps 
needed at each scale h depends on how much 
averaging-out is needed for the fluctuations at 
that scale, which is roughly 0(a 2 ), where ah is 
the average contribution of such fluctuations, in 
any one configuration, to the measurement devi- 
ation. This contribution depends on the desired 
observable. For some (perhaps less interesting) 
observables, such as energy, the contribution of 
finer scales dominate in such a way that most of 
the simulation work should be done on the finest 



grid. In such cases a cycle index 7 < 2 d should 
be used. 

For many (perhaps the more interesting) ob- 
servables, such as magnetization or susceptibility 
(except at d > 6), the contribution ah increases 
with the scale h in such a way that most of the 
Monte-Carlo work should be done on the coars- 
est grids. This is obtained by multigrid cycles 
with index 7 > 2 d . In such cases, the finer the 
level the more rare its activation, and the finest 
grid to be reached at all depends on the accuracy 
desired for the observable. This indeed should 
be the situation with any observable which has 
a thermodynamic limit, and the deviations ah 
should then more properly be defined as devia- 
tions from that limit, not from the average on 
any finite lattice. 

All these claims are strictly true for the Gaus- 
sian model with linear interpolation, for which 
the claims were precisely defined and confirmed, 
both by detailed numerical experiments and by 
mode analyses For example, it has been 

shown that a multigrid cycles with index 2 d < 
7 < 64 calculate the thermodynamic limit of the 
susceptibility to within accuracy e in 0(a 2 /e 2 ) 
computer operations, where a is the susceptibil- 
ity standard deviation. 

This efficiency — obtaining accuracy e in 
0(a 2 /e 2 ) operations — is the ideal statistical 
efficiency. It is just the same relation of com- 
putational cost to accuracy as in calculating by 
statistical trials any simple average, such as the 
frequency of "heads" in coin tossing. Obtaining 
this ideal statistical efficiency in the calculation 
of thermodynamic limits should generally be the 
goal of our algorithmic development. 

Note in the example described above that the 
use of 7 > 2 d contradicts the condition stated 
in Sec. 5.3 for eliminating CSD. This emphati- 
cally shows that the main issue is not the CSD, 
but the overall relation of computational cost to 
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obtained accuracy. To be sure, any multigrid al- 
gorithm that can achieve the ideal statistical ef- 
ficiency would also be able, upon change of 7, to 
eliminate CSD. 

The size L d of the finest grid that should be 
employed increases of course with the decrease 
of e, because one needs to have a grid for which 
the computed average is only distance e from 
its infinite-grid value. For some observables the 
dependence L = L(e) may be such that L(e) d 
increases faster than e~ 2 . Even in such cases 
the ideal statistical efficiency may still be at- 
tained, by the process of domain replication: On 
a domain with only 0(e _1 ) sites and with pe- 
riodic boundary conditions, the finest level with 
meshshize a is first equilibrated (fast, by a mul- 
tilevel process), then coarsened to meshsize 2a. 
Using the periodicity, the 2a lattice is now dupli- 
cated in each direction (to an overall 2 d factor in- 
crease in volume) , together with the Hamiltonian 
E 2a . Having equilibrated this wider and coarser 
lattice, one then proceeds to meshsize 4a, where 
the domain is duplicated again. And so on until 
the size of domain needed for accuracy e is ob- 
tain, at which our regular multigrid cycle, accom- 
panied with measurements at coarse levels, can 
be performed. On one hand this domain repli- 
cation process receives enough averaging infor- 
mation from the finest levels to have its coarser 
level Hamiltonians accurate to within O(e). On 
the other hand it employs, on coarse grids, the 
full size of a domain needed to produce the ob- 
servable to an accuracy e. 

5.5. N on- Gaussian models 

It is not at all clear that the same kind of ideal 
statistical efficiency can be obtained for interest- 
ing models far from the Gaussian. But a detailed 
examination of the looming difficulties indicates 
that they are not insurmountable. 



The first difficulty is in deriving the explicit 
Hamiltonian E 2h (u 2h ) that will satisfy (5.3). 
One advantage of the constant interpolation I^ h 
(cf. Sec. 5.3) is that it more easily yields such 
an explicit Hamiltonian. However, for the ideal 
efficiency, as explained above, linear interpola- 
tion seems necessary. For linear interpolation the 
explicit expression of E 2h , E 4h , etc. will get in- 
creasingly complicated, defeating the purpose of 
O(l) calculation per gridpoint in all coarse- level 
Monte-Carlo processes. A method to derive sim- 
ple but approximate explicit coarse-grid Hamil- 
tonian E 2h {u 2h ) has been described in Sec. 1.4, 
based on the observation that one is only inter- 
ested in having a good approximation for smooth 
u 2h , i.e., u 2h with small strains u 2h — u 2 , at 
any neighboring sites k and I. In case of gauge 
fields, the relevant strains have the form curl A 2h 
(cf. Sec. 4.1). The Taylor expansion in terms of 
small strains, such as (1.9), gives us also strain 
limits, i.e., bounds on the size of the strains un- 
der which the truncated expansion still yields a 
certain accuracy e t . 

In many cases one likes E 2h to preserve the 
topological properties (e.g., 27r-periodic depen- 
dence, as in (4.4)) of E h , so that it can al- 
low large-scale moves characteristic to the topol- 
ogy. Then the Taylor expansion, such as (1.19), 
should be approximated again by (e.g. trigono- 
metric) functions carrying this topology. The 
strain limits would guarantee that this approxi- 
mation, too, has 0(e t ) accuracy. Often, the ap- 
proximate E 2h derived this way would have the 
same functional form (hence the same programs) 
as E h , but with coefficients and an external field 
that depend on the current fine-grid configura- 
tion u h (which of course remains fixed through- 
out the simulation on grids 2h and coarser). 

Unlike the situation in Sec. 1.4, however, in 
the statistical context discussed here, the fact 
that E 2h is only approximated may destroy the 
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detailed balance of the simulation, putting its 
statistical fidelity in question. Detailed balance, 
though, is not sacred: like everything else sub- 
mitted to numerical simulations like the con- 
tinuum, the unboundedness of domains, or the 
infinite length of the Monte-Carlo chain, etc. 
- it can be approximated, provided one keeps 
a handle on the approximation. Such a handle, 
for example, is the e t introduced above: it deter- 
mines the above mentioned strain limits, which 
can be actually imposed, and one can always ex- 
amine how a further reduction of e t affects the 
calculated average. Generally, St will be lowered 
together with e, the target accuracy of the cal- 
culated average. Another handle on the accuracy 
of E 2h can be the order of interpolation l^ h . 

The lowering of £t together with e can be done 
without running into slowing down because a 
given move on a fixed level becomes smoother 
and smoother on the scale of the finest grid as 
the latter becomes finer and finer, hence this 
move will be acceptable for smaller and smaller 
e t as e is reduced further and further. Other in- 
gredients in avoiding slowing down are the "up- 
dates" , the Hamiltonian dependent interpolation 
and the stochastic appearance of disconnections, 
all described next. 

The Monte-Carlo process on each level is con- 
strained by the strain limits, inherited from all 
the coarsening stages leading from the finest level 
to the current one. Hence, if the strain limits are 
approached too closely at some points of some 
intermediate levels, the process on coarser levels 
will be completely paralyzed. To avoid this, the 
algorithm uses updates. An update is a return 
from any current level h to the next finer grid, 
h/2, introducing here the displacements implied 
by the current grid solution (step (iv) in Sec. 
1.6), and then coarsening again (with displace- 
ments u h and strains being now defined with 
respect to the updated fine grid solution u h / 2 ). 



This updates E 2h and "relieves" it from strains 
too close to their limits. An update can be done 
locally, wherever strain limits are approached. 
Or, more conveniently and sometimes more ef- 
fectively, it can be done globally, e.g., after each 
full Monte-Carlo sweep. In principle, while intro- 
ducing the displacements on grid h/2 during an 
update, some of that grid strains may approach 
their limits, requiring an update at level h/A. 
This, however, seldom happens and cannot cas- 
cade to ever finer levels, because moves on any 
level are very smooth on the scale of much finer 
levels, and will therefore affect their strains very 
little. In some models, once in a (long) while a 
coarse level may cause a "break", a discontinu- 
ous change that requires updating all the way to 
the finest scales, but such updates will hopefully 
be local and sufficiently rare. 

Due to such updates, large moves are per- 
missible on coarser grids. To make such moves 
also probable, the interpolation I^ h at each level 
should be so constructed so that its potential 
displacements are as probable as possible. This 
implies, for example, that if a certain variable u\ 
is coupled by E h more strongly to its neighbors 
in one direction than in another, then the inter- 
polation to site i should have a proportionately 
larger weight in that direction (cf. Sec. 3.5). It is 
true that on the finest level a the Hamiltonian is 
usually isotropic and hence I% a can be isotropic 
too; but on coarser levels, due to stochastic vari- 
ations in u h at each coarsening stage, the pro- 
duced Hamiltonian E 2h is no longer isotropic, 
hence anisotropic interpolation /J^ should cor- 
respondingly be introduced. This tends to create 
even stronger anisotropy at corresponding points 
of still coarser levels. 

Thus, stochastically, at points of sufficiently 
coarse grids, the interpolation may become heav- 
ily one-sided, approaching in fact the constant 
interpolation. The latter does not entail strain 
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limits. It also involves deletion of couplings in 
some directions. These together free the coarse 
levels to perform moves that introduce "topolog- 
ical" changes (vortices, instantons, etc.) to the 
configuration. 

To obtain the ideal statistical efficiency, an- 
other concern associated with non-Gaussian mod- 
els is the dependence between fluctuations at 
neighboring scales (see Sec. 5.4). The above 
blueprint does have the potential of dealing with 
that, because the operation at each level does 
involve frequent updates from the next finer 
level(s). In fact, the assertion made above that 
an update cannot cascade unboundedly to ever 
finer levels is exactly related to the assumption 
of weak dependence between far scales. 

5.6. Stochastic coarsening. Discrete models 

Note in the above outline that never de- 
pends on the current configuration u ; it only de- 
pends on the coefficients of E h . This is necessary 
for approaching detailed balance as St — > 0. On 
the other hand, the coefficients of E do depend 
on the current configuration u h / 2 . Thus I^u, and 
hence E 2h , will also depend on u h / 2 . This en- 
ables the shape of the large-scale moves to de- 
velop stochastically. Far from the Gaussian, such 
stochastic development of the collective modes, 
having them chosen by the system itself, is essen- 
tial for making them associated with enough free 
energy, hence probable. (This, incidentally, is ex- 
actly the important capability lacking in the un- 
igrid approach; cf. Sec. 5.1.) By contrast, on the 
one hand, an apriori large scale movement, in- 
considerate of the current fine scale fluctuations, 
is likely to contradict them at many spots and 
hence to increase the energy very much; it will 
thus be rejected by the Monte-Carlo process (or 
accepted with a very small, useless amplitude). 
On the other hand, large-scale moves which are 



directly based on the current configuration, will 
destroy statistical fidelity. 

This is most visible in discrete-state models, 
such as the Ising and, more generally, the Potts 
spin models, which are as far as one can get from 
the Gaussian. Consider for example the ferro- 
magnetic Ising model Hamiltonian 

E(s) = -J2 J ij s i s ii (Jij>0) ( 5 - 6 ) 

where Si = ±1 is the spin at site i of a d- 
dimensional lattice and the summation is over 
neighboring i and j. At some interesting (e.g., 
the critical) temperature T, any probable con- 
figuration would exhibit large regions of aligned 
spins. The only type of a large-scale Monte-Carlo 
step for this model is offering the flipping of a 
large block of spins, to be accepted or rejected 
according to the probability distribution (5.1). 
If the block is chosen apriori, independently of 
the current configuration, its boundary is un- 
likely to have much intersection with the current 
boundaries of regions of aligned spins. Therefore, 
flipping the block would most likely add many 
violated bonds (negative JijSiSj), which would 
increase the energy by much, hence will most 
probably be rejected. On the other hand, choos- 
ing a block with boundaries coinciding with the 
current boundaries of spin alignment would cre- 
ate statistical bias, favoring moves that increase 
magnetization. 

The solution to this dilemma is indeed stochas- 
tic coarsening. An example, now classical, is the 
Swendsen-Wang (SW) coarsening |7^| . This con- 
sists of a step-by-step blocking. At each step 
one positive bond Jij is 11 terminated^, i.e., re- 
placed by either (deleted bond) or oo (frozen 
bond, blocking Si and Sj together) in probabili- 
ties Pij and 1 — Pij respectively. In case of freez- 
ing, Si and Sj effectively become one spin, whose 
bonds to neighboring spins can easily be cal- 



38 



A. Brandt / Multigrid Methods in Lattice Field Computations 



culated, yielding a new Hamiltonian, still hav- 
ing the general form (5.6). It can be proved 
that if Pij = qij e-K.p(—JijSiSj/T), where s is the 
current (termination-time) configuration and 
does not depend on ~s, then simulating thereafter 
with the new Hamiltonian preserves the overall 
statistical equilibrium. At the next step a posi- 
tive bond of the (new) Hamiltonian is similarly 
terminated, and so on. If — exp(— Jy/T), 
then only spins currently having the same sign 
will be blocked together, so that current bound- 
aries of spin alignment will eventually become 
part of block boundaries, overcoming the above 
dilemma. 

In the original and most used version of the 
SW algorithm, the blocking steps continue until 
a Hamiltonian with no positive bond is reached. 
Each block of spins is now flipped in probabil- 
ity 1/2. This may be followed by several usual 
Monte-Carlo sweeps with the original Hamilto- 
nian (5.6), and then a new, similar sequence of 
stochastic blocking steps is made, starting from 
the original Hamiltonian. And so on. This algo- 
rithm proved very efficient: the dynamic criti- 
cal exponent z has been drastically reduced, al- 
though not quite to z = 0. (Recent measure- 
ments H indicate it is even lower than the orig- 
inal estimate z = .35.) 

Note that the stochasting blocking is not un- 
like the constant interpolation which happens 
to stochastically develop at coarse levels in the 
procedures described in Sec. 5.5. Indeed, ex- 
plicit stochastic steps in choosing the interpo- 
lation I% h can be added to those procedures. 
This may prove necessary wherever (e.g., at some 
coarse levels) the possible local shapes of slowly- 
changing components belong to several nearly- 
disjoint sectors, implying several discrete alter- 
natives in constructing I^ h . 

The SW algorithm, and its single cluster vari- 
ant ]7q| , have been cleverly generalized to many 



more models by embedding Ising variables in 
those models |7§], t 
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These procedures are not built hierarchically 
as the multiscale and multigrid algorithms de- 
scribed in our earlier sections, but often achieve 
comparable efficiency in reducing z. The expla- 
nation is that, due to the discreteness of the 
Ising variables, blocks are created of all sizes, 
thus producing moves on all scales. On the other 
hand, a more deliberate multigrid-like organiza- 
tion may produce two additional benefits. First, 
it can make the algorithm even more efficient in 
reducing the auto-correlation time r. Secondly, 
and more important, it may allow cheap collec- 
tion of many measurements at the coarse levels, 
possibly reducing or eliminating the volume fac- 
tor as well (cf. Sec. 5.4). We will now examine 
this possibility. 

5. 7. Multiscale blocking 

The SW algorithm described above can be 
modified into a hierarchical multiscale proce- 
dure in the following way. At the first level of 
coarsening, only a subset of bonds is terminated. 
This subset is chosen adaptively so that only 
blocks of size not greater than b are created (e.g., 
b = 2 or b = 2 d ). The resulting Hamiltonian E 1 
still includes many positive bonds ("live inter- 
actions"). In the second level of coarsening, ad- 
ditional bonds are similarly terminated, yielding 
a Hamiltonian E 2 . Etc. The Hamiltonian pro- 
duced at the €-th level still has the form 



(5.7) 



but each "level- £ spin" si = ±1 is actually a 
block of between 1 and b level- (£ — 1) spins s^ 1 
having the same sign (which will be taken as the 
sign of sf as well). The coarsest level Hamilto- 
nian is reached when no positive bond is left. 
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With these levels, the usual multigrid cy- 
cles can be applied; e.g., the cycles displayed 
in Fig. 1, re-interpreted as follows. Circles at 
level 2 l a stand for Monte-Carlo sweeps with 
the Hamiltonian E e . The downward arrow (the 
coarsening step) from level 2 a to 2 i+1 a repre- 
sents the blocking steps that create E l+1 . The 
upward arrow (uncoarsening) from level 2 e+1 a 
to 2 a means executing in terms of s f_1 the flips 
found in s l . 

It is easy to see that a V cycle (7 = 1) with 
no Monte-Carlo passes at coarse levels is ex- 
actly equivalent to the original SW algorithm. 
By choosing b > 2, a W cycle (7 = 2) will not 
be much more expensive, and experiments show 
that it significantly cut the auto-correlation time 
r @ > @ ■ The impression indeed was that such 
a W cycle completely eliminated CSD, yielding 
z = 0, but it has later been proved p2j that at 
least a certain version of this algorithm (where 
the bonds terminated first are the same at all cy- 
cles) still suffers a (very marginal) slowing down. 

However, as emphasized in Sec. 5.4, CSD is 
not the main issue. The question is how much 
computational work is needed per effectively- 
independent measurement. Consider for exam- 
ple measurements for the susceptibility (M 2 ), 
which can be taken at any level £ since Sfcsf. = 
S m s^ _1 = • • • = SjSi = M. Can one benefit from 
averaging over M 2 within the cycle? 

To sharpen this question, let us denote by 
Xo = (M 2 ) the true susceptibility, by do = 
((M 2 — Xo) 2 ) 1 ^ 2 the standard deviation from xo 
of M 2 at any single configuration, by xi the 
average of M 2 for the Hamiltonian E 1 and by 
°i = ((Xi — Xo) 2 ) 1 ^ 2 the standard deviation of 
Xi from xo - The question then boils down to this: 
is a 1 much smaller than ctq? Does <Jx/<Jq — > as 
L — > 00? 

The first answer to this question was disap- 
pointing. Two dimensional experiments near the 



critical temperature showed that as L increases, 
both a\ and Co remain proportional to xo! the 
ratio o~\ I ao is determined only by the fraction of 
bonds being terminated in creating E . In reduc- 
ing the number of degrees of freedom one looses 
not just fine-scale fluctuations, but large-scale 
ones as well. 

At first, this strong correlation between scales 
appears to be a necessary property of discrete- 
state models. But then, a similar situation is en- 
countered when constant interpolation is used 
even for the Gaussian model (cf. Sec. 5.3). So 
the question now is whether a better coarsening 
technique, capturing some features from linear 
interpolation, can be devised for Ising spins, so 
as to reduce o~\. The question is important since 
the Ising model, as an extreme case, can teach us 
what can be done in other models far from the 
Gaussian. 

At this point the answer, as we will see, is cer- 
tainly positive, although preliminary: there are 
so many possibilities, and the search has just 
begun. One obvious difference between constant 
and linear interpolation is that the latter relates 
a given variable to two neighbors, not one. Thus, 
our first attempt at a linear-like interpolation is 
to replace the two-spin SW coarsening with the 
following three spin coarsening (3SC), developed 
in collaboration with Dorit Ron. 

For simplicity we describe (and have developed 
and tested) only the case of uniform bonds (con- 
stant Jij); this is not essential, but introduces 
simplifying symmetries. Denote by f3 = Jij/T 
the uniform thermal binding between neighbors. 
Consider a spin sq with two neighbors, s_ and 
s+ say. The current Hamiltonian has the form 

—E = -/3s Q 3_ - f3s s + 

where the dots stand for any other terms. Three 
other Hamiltonians are offered as alternatives: 
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7pE\ = —oosqs^ — asos+ — ■ ■ • 

yi?2 = — CLSqS- — OOSqS+ — ■ ■ ■ 

±E 3 = -bs_s+ 

The oo value in E\ (E2) means that So and 
s_ (s + ) are blocked together. Note that in E3 
the two bonds between sq and its two neigh- 
bors are deleted, but a new direct bond is in- 
troduced between the neighbors themselves. One 
selects Ei with probability Pj (i = 1, 2, 3), where 
Pi + Pi + P3 = 1. To obtain detailed bal- 
ance, these probabilities are taken to depend 
on the current value of s_, so and s+ accord- 
ing to Table 1 — plus the obvious rule that 
Pi(-s_, -s , —s+) = Pi(s-,s ,s + ) — and the 
value of a and b are taken so that 

e 2a = (e 2/3 -e- 2 ' 3 )/(2-2p st ) 

e 2b = e -W/ p ^ 



S- so s + 


Pi 


Pi 


Pz 


+ + + 


1(1 -e-V) 


i(l-e-^) 


e -if3 


+ - + 








1 


+ + 


1-p* 





P* 


+ 





1-p, 


P* 



Table 1 



p* being a small positive parameter. We chose 
p* = .15, but other values in the range .05 < 
p* < .2 are perhaps as good. 

The detailed balance of this, and also that of 
SW and other coarsening schemes, is a special 
case of the following easily-proven theorem. 

Theorem (Kandel-Domany |5lf| ) . Replacing the 
Hamiltonian E by one of the Hamiltonians Ei , . . . , E 
in probabilities Pi(s), . . . , Pk(s) respectively, where 
s is the configuration at the time of replacement, 
preserves detailed balance if 

P i ( a ) = g je ^W- B ««)/ r , (5.8) 
where qi is independent of s. ■ 



We have tested 3SC on an L x L periodic grid 
by applying the coarsening step for all triplets 
s_, so and s + at grid positions (j, 2k — 1), (j, 2k) 
and (j, 2k + 1) respectively such that j + k is 
even. We compared it with an SW coarsening 
that terminated all the corresponding (so,s_) 
and (so,s+) bonds. Results at the critical tem- 
perature are summarized in Table 2. They show 
that for 3CS, unlike SW, the ratio ai/xo de- 
creases with L. This means that if the suscepti- 
bility is measured on the first coarse grid, with- 
out ever returning to the fine, the average error 
is small: it tends to as L increases. 



L 


XO 


CO 


01 


01 








SW 


3SC 


4 


12.2 




1.8 


.7 


8 


41.4 




7.2 


1.5 


16 


139.5 


56.8 


25.6 


4.0 


32 


470.2 


192.5 


81.6 


10.6 



Table 2 



The observation that has led to the construc- 
tion of 3SC is that the basic flaw in the SW coars- 
ening is the introduction of many deletions, usu- 
ally clustered along well-defined lines: the lines 
of current boundaries of spin alignment. These 
lines therefore exhibit in E 1 weakened couplings, 
and are thus likely to persist as boundaries of 
spin alignment also on coarse grids. This means 
strong correlation between different coarse grid 
configurations. In 3SC the introduction of such 
weakened-coupling lines is minimized. 

This is just a first attempt; it all may well be 
k ' done better. Observe that the blocks created by 
3SC are not necessarily continguous: the Hamil- 
tonian i?3 creates a bond between s_ and s+, so 
they latter may be blocked together without hav- 
ing the points in between, such as so, included 
in the block. More general schemes may create 
blocks that are not necessarily disjoint. An so 
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forth: the possibilities are many. 

It is not clear whether the ideal statistical ef- 
ficiency is always attainable. What has been es- 
tablished, we believe, is that it is possible to 
greatly benefit from making many measurements 
at the coarse levels of a multilevel Monte-Carlo 
algorithm, even in discrete-state models, if a suit- 
able coarsening scheme is used. 

6. Other Relevant Multilevel Techniques 

We briefly mention here several other multi- 
level techniques that are relevant to lattice field 
computations. 

Performing general integral transforms, or solv- 
ing integral and integro- differential equations, 
discretized on n grid points, have been shown to 
cost, using a multigrid structure, only 0(n) or 
O(nlogn) operations, even though they involve 
full nx n matrices ]2(| , |ll| . In particular this is 
true for performing Fourier transforms on non- 
uniform grids. An extension has been devised to 
transforms with oscillatory kernels |lq ]. 

The calculation of the n(n — 1) interactions be- 
tween n bodies (e.g., to obtain the residual forces 
in Sec. 1.1), can be performed in 0(n) operations 
by embedding in a multigrid structure |p"5| . 

Multilevel annealing methods have been shown 
as very effective for global optimization of sys- 
tems with a multitude of local optima and with 
multi-scale attraction basins, in which cases the 
usual simulated annealing method may be ex- 
tremely inefficient. This includes in particular 
ground-state calculations for discrete-state and 
frustrated Hamiltonians p5| , ^6fl . Work now is 
in progress to extend these multilevel annealing 
techniques to the calculation of ground states of 
many particle problems. 

Multilevel Monte-Carlo methods, similar to 
those described in Sec. 4, are also being devel- 



oped for many particle (e.g., atom) simulations. 
The particles are embedded in a lattice which al- 
low collective stochastic moves, somewhat simi- 
lar to the collective moves described in Sec. 1.4 
above. 

Finally, it has been demonstrated that the 
multigrid methodology can be used as a tool 
for directly deriving the macroscopic equations 
of a physical system. For example, in the case 
that the interactions (1.1) above are Lennard- 
Jones or similar atomic interactions, the equa- 
tions obtained on the coarse levels of the multi- 
grid structure described in Sec. 1.6 are essentially 
the equations of elasticity for that material. In 
this example a particle problem has given rise to 
a continuum macroscopic description, expressed 
as partial differential equations. The reverse can 
also happen: starting with PDEs, such as wave 
equations, the macroscopic description may end 
up being that of a ray or a particle (cf. Sec. 3.7). 
Sometimes, a statistical microscopic system can 
give rise to a deterministic macroscopic system, 
or vice versa. 

The derivation of macroscopic equations for 
statistical systems may be a natural continuation 
of the approach described at the end of Sec. 5.4, 
where very fine levels may be used only at very 
small subdomains. 

Generally, the macroscopic equations obtained 
this way are expected to be much simpler than 
those derived by group renormalization meth- 
ods. This is directly due to the slight "iterative- 
ness" left in the process by the updates described 
above, which relieves the coarse level from the 
need to describe in one set of equations all the 
possible fine-level situations. In many cases the 
need for such updates may tend to disappear on 
sufficiently coarse levels. Even when this is not 
the case, an activation of much finer levels dur- 
ing large-scale (coarse level) simulations will only 
rarely and locally be needed. 
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